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© Calculating equipment for solving systems of linear equation. 

© A linear calculating equipment comprises a memory for storing a coefficient matrix, a known vector and an 
unknown vector of a given system of linear equations, a pivoting device for choosing pivots of the matrix, a 
plurality of preprocessors for executing K steps of preprocessing for multi-pivot simultaneous elimination, an 
updating device for updating the elements of the matrix and the components of the vectors, a register set for 
storing values of the variables, a back-substitution device for obtaining a solution and a main controller for 
controlling the linear calculating equipment as a whole. 
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BACKGROUND OF THE INVENTION 

1. Raid of the Invention 

5 The present invention relates to calculating equipment for solving systems of linear equations, parallel 
calculating equipment for solving systems of linear equations, and methods of parallel computation for 
solving systems of linear equations. 

2. Description of the Related Art 

10 

The need for solving systems of linear equations at high speed frequently arises in numerical analysis 
of the finite element method and the boundary element method and other processes of technical 
calculation. 

Among algorithms based on direct methods of solving systems of linear equations is Gauss elimination 
15 method based on bi-pivot simultaneous elimination, which is described in Takeo Murata, Chikara Okuni and 
Yukihiko Karaki, "Super Computer-Application to Science and Technology," Maruzen. 1985. pp 95-96. The 
bi-pivot simultaneous elimination algorithm eliminates two columns at the same time by choosing two pivots 
at one step. It limits simultaneous elimination to two columns and the choice of pivots to partial pivoting by 
row interchanges. Furthermore it considers the speeding up of its process in terms of numbers of repetition 
20 of do-loops only. 

If simultaneous elimination is not limited to two columns and extended to more than two columns, the 
corresponding algorithms will be hereafter called multi-pivot simultaneous elimination algorithms. 

A similar algorithm to multi-pivot simultaneous elimination algorithms is described in Jim Armstrong, 
"Algorithm and Performance Notes for Block LU Factorization," International Conference on Parallel 
25 Processing, 1988, Vol. 3, pp 161-164. It is a block LU factorization algorithm intended to speed up matrix 
operations and should be implemented in vector computers or computers with a few multiplexed proces- 
sors. 

Therefore, according to prior art, there has not yet been developed Gauss elimination method or Gauss- 
Jordan elimination method which is based on multi-pivot simultaneous elimination and can be efficiently 
30 implemented in scalar computers and parallel computers. 

SUMMARY OF THE INVENTION 

The object of the present invention is therefore to provide high-speed parallel calculating equipment 
35 and methods of parallel computation for solving systems of linear equations by means of Gauss elimination 
method and Gauss-Jordan's method based on multi-pivot simultaneous elimination. 

In order to achieve the aforementioned objective, according to one aspect of the present invention, 
there are provided 

a memory that stores reduced coefficient matrices A (r) with zeroes generated from the first to the r-th 
40 column and corresponding known vectors b (r) and an unknown vector x expressed by 

A (r) = (ai?) . 1 s i, j s n. 

4S 

b™ = {bl r) . 2>a w b™)'. (1) 

w x = lx X s x 2 . . . . , x n ) t 



for a given system of linear equations 
55 A<°>x = tP\ (2) 

a pivot choosing section that is connected to the memory, chooses a pivot in the i-th row of A°* 1) , and 
interchanges the i-th column with the chosen pivotal column, 
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a preprocessing section Ai that, immediately after the pivot choosing section's above operation 
determines the transposed pivot 



(3) 



calculates 

w 



&pk$lj* s a pJt?lj/ a p!£5p**l (4) 



75 

for pk + 2 £ j £ n and 



20 



30 



= bpk+1 / a pk+lpk+l * ( 5 ) 

k - 1 preprocessing sections A,, where t = 2, 3 k, each of which is connected to the memory and 

calculates 



& e 9p&t = &pk?tpk+l* (6) 



C-2 



35 Reaik+V 



40 t-1 



(8) 



45 



50 



ci) ) C-1 C 1) ) 

bpk^t* 11 = £jx£?e " 5^ RQffp£*¥ bp**!*!* (10) 



for pk + t £ j £ n, and, immediately after the pivot choosing section determines the transposed pivot 



55 Zpfc+tpx+t * 



(id 



3 



EP 0 523 544 A2 

calculates 

a^> - a&r* / ^SRa . (12) 

#-^ M V#^ (13) 

10 

for pk + t + 1 S j £ n, 

an updating section B that is connected to the memory, comprises a set of k registers and an 
arithmetic unit, and calculates 



75 



20 



Regj 0) * (14) 

R*eF* - - step (i5) 



- affile - g ttg^ *SKS*>*. (16) 



30 



35 



40 



(17) 



bi™* = jbr i - £ j**!-* bgz* (18) 



for (p + 1)k + 1 3 i, j S n retaining the values of 



Regj 0) , . . . , Reg} k) 

45 

in the register set, 

a back-substitution section that is connected to the memory and obtains the value of the unknown 
vector x by calculating 

50 

*i = b} n) (19) 

55 and 
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^1*1) „ j^-i) - a g Xi (20) 

5 for 1 £ h S i - 1 for i = n. n - 1 1 in this order of i, and 

a main controller Q that, if n is a multiple of k. instructs the pivot choosing section, the preprocessing 

sections Ai A*, and the updating section B to repeat their above operations for p = 0, 1. .... n/k - 2, 

and instructs the pivot choosing section and the preprocessing sections Ai A* to execute their above 

operations for p = n/k -1, and. if n is not a multiple of k. instructs the pivot choosing section, the 
10 preprocessing sections Ai ..... Ak, and the updating section B to repeat their above operations for p = 0, 
1. . . . [n/k] - 1, where [x] denotes the greatest integer equal or less than x. and instructs the pivot choosing 
section and the preprocessing sections Ai . . . . , A^^ to execute their above operations, and in both 
cases, instructs the back-substitution section to obtain the unknown vector x. 
According to another aspect of the present invention there are provided 
15 a memory that stores coefficient matrices A (r> . known vectors b (r> and the unknown vector x expressed 
by (1) for a given system of linear equations (2). 

a pivot choosing section that is connected to the memory, chooses a pivot in the i-th row of A°* 1) , and 
interchanges the i-th column with the chosen pivotal column, 

a preprocessing section Ai that, immediately after the pivot choosing section's above operation 
20 determines the transposed pivot (3), calculates (4) for pk + 2 £ j S n and (5), 

k - 1 preprocessing sections At, where t = 2, 3, . . . , k, each of which is connected to the memory. 

calculates (6), (7) (10) for pk + t £ j £ n, and, immediately after the pivot choosing section determines 

the transposed pivot (11). calculates (12) and (13) for pk + t + 1 £ j £ n. 

an updating section B* which is connected to the memory, comprises a set of k registers and an 

25 arithmetic unit, and calculates (14), (15) (18) for 1 £ i £ pk. (p + 1)k + 1 S i £ n. (p + 1)k + 1 £ j £ n 

if n is a multiple of k or p < [n/k] and for 1 £ i £ [n/k]k, [n/k]k + 1 £ j £ n otherwise, retaining the values 
of 



Reg?* Reg}* 

in the register set, 

k - 1 postprocessing sections Ct, where t = 1, 2, .... k - 1, each of which is connected to the memory 
35 and calculates 



40 



45 



so 
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(21) 
(22) 



(23) 



(24) 



(25) 



(26) 



(27) 



(28) 



USST 11 - i#£ c> - Reff«-» bgtS* (29) 

35 

for pk + t + 2 £ j £ n, 

a main controller J that, if n is a multiple of k, instructs the pivot choosing section, the preprocessing 

sections Ai, .... A* the updating section B', and the postprocessing sections Ci Cm to repeat their 

above operations for p = 0, 1 n/k - 1, and, if n is not a multiple of k, instructs the pivot choosing 

40 section, the preprocessing sections Ai, .... A*, the updating section B\ and the postprocessing sections 

Ct Cm to repeat their above operations for p = 0, 1 , . . . [n/k] - 1 , and instructs the pivot choosing 

section, the preprocessing sections Ai Anjn/kik. the updating section B\ and the postprocessing 

sections Ci , . . . , 

Cn-[n/k]tc to execute their above operations for p = [k/n]. 

45 According to another aspect of the present invention there is provided a system of nodes a 0 

a P . 1t each of which is connected to each other by a network and comprises: 

a memory that stores blocks of k rows of each coefficient matrix A <0 and corresponding k components 
of each known vector b {r) and an unknown vector x expressed by (1) for a given system of linear equations 
(2), 

so a pivot choosing section that is connected to the memory, chooses a pivot in the i-th row of A (M \ and 
interchanges the i-th column with the chosen pivotal column. 

a preprocessing section Ai that is connected to the memory and calculates (4) for pk + 2 £ j £ n and 

(5), 

k - 1 preprocessing sections A,, where t = 2, 3, . . . , k, each of which is connected to the memory, 
55 calculates (6), (7), .... (10) for pk + t £ j £ n, and calculates (12) and (13) for pk + t + 1 S j S n, 

an updating section B that is connected to the memory, comprises a set of k registers and an 
arithmetic unit, and calculates (14), (15) (18) for (p + -1)k + 1 S j S n retaining the values of 



T5 



20 



Reg™ -*JfiSU». 



Reg™ - 



Reg™ -^SKLn. 
a&r" = - ***** a^i? . 

• • • t 



30 
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Regl'K . . . , Reg} k) 

5 in the register set 

a back-substitution section that is connected to the memory and obtains the unknown x by back- 
substitution, that is, by calculating (19) and (20), 

a gateway that is connected to the memory and is a junction with the outside, and 
a transmitter that is connected to the memory and transmits data between the memory and the outside 
10 through the gateway. 

If the (pk + 1)th through (p + 1)k-th rows of A (0) and corresponding components of b (0> and x are 
assigned to the node a u , then the pivot choosing section of the node a u determines the pivot (3), and the 
preprocessing section of the node a u calculates (4) and (5) for pk + 2 £ j £ n, and the transmitter transmits 
the results to the memory of every other node through the gateway, while the updating section B of the 

is node in charge of the i-th row calculates (14) for every i such that (p + 1)k + 1£i£n. This series of 
operations is below called parallel preprocessing At . 

The preprocessing section At of the above node a u calculates (6), (7), (8), (9), (10) for pk + t £ j £ n, 
and, immediately after the pivot choosing section of a u determines the pivot (11), calculates (12) and (13) 
for pk + t + 1 £ j £ n t and the transmitter transmits the results to the memory of every other node through 

20 the gateway, while the updating section B of the node in charge of the i-th row calculates 

a*sl"* - ai$ c - £ ( 30) 

25 

for every i such that (p + 1)k + 1 £ i £ n. This series of operations is below called parallel preprocessing 
A,, where 2 £ t £ k. 

The updating section B of each node in charge of the i-th row such that (p + 1)k + 1 S i £ n also 
30 calculates (14) through (18) retaining the values 
of 



Regr , .... Regr' 

35 

in the register set. These operations are below called parallel updating B. 

According to a further aspect of the present invention there is provided a main controller G p that is 

connected to the system of nodes by the network, distributes and assigns the rows of the coefficient matrix 
40 A (0) and the components of b <0) and x to the nodes in such a manner as each block of consecutive k rows 

and corresponding 2k components is transmitted to the memory of one node in the cyclic order of ao 

ap. 1( ao, ai, . . . , and, if n is a multiple of k, instructs each node to execute parallel preprocessing Ai 

through A* and parallel updating B for p = 0, 1, . . . , n/k - 1, and, if n is not a multiple of k, instructs each 

node to execute parallel preprocessing Ai through A* and parallel updating B for p = 0, 1, ... , [n/k] - 1 
45 and to execute parallel preprocessing Ai through A^^it for p = [n/k], and instructs the nodes to obtain 

unknown vector by means of back-substitution. 

According to another aspect of the present invention there is provided a system of nodes ao , 

ap.u each of which is connected to each other by a network and comprises: 

a memory that stores blocks of k rows of each coefficient matrix A <r) and corresponding k components 
so of each known vector b (r> and an unknown vector x expressed by (1) for a given system of linear equations 

(2), 

a pivot choosing section that is connected to the memory, chooses a pivot in the i-th row of A (M) , and 
interchanges the i-th column with the chosen pivotal column, 

a preprocessing section Ai that is connected to the memory and calculates (4) for pk + 2 £ j £ n and 

55 (5), 

k - 1 preprocessing sections A,, where t = 2, 3 k, each of which is connected to the memory, 

calculates (6), (7) (10) for pk + t S j S n, and calculates (12) and (13) for pk + t + 1 S j £ n, 

an updating section B' that is connected to the memory, comprises a set of k registers and an 
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arithmetic unit, and calculates (14), (15) (18) for (p + 1)k + 1 2 j & n retaining the values of 



Reg?* Reg}* 

5 

in the register set, 

k - 1 postprocessing sections Ct, where t = 1, 2, .... k - 1, each of which is connected to the memory 

and calculates (21), (22) (29) for pk + 2 + 2 S j S n, 

ro a gateway that is connected to the memory and is a junction with the outside, and 

a transmitter that is connected to the memory and transmits data between the memory and the outside 
through the gateway. 

If the (pk + 1)th through (p + 1)k-th rows of A m and corresponding components of b (0) and x are 
assigned to the node a u , then the pivot choosing section of a u determines the pivot (3), and the 

75 preprocessing section of a„ calculates (4) and (5) for pk + 2 S j S n, and the transmitter transmits the 
results to the memory of every other node through the gateway, while the updating section B of the element 
processor in charge of the i-th row calculates (14) for every i such that (p + 1)k + 1 & i & n. This series of 
operations is below called parallel preprocessing Ai . 

The preprocessing section At of the node o u calculates (6). (7), (8), (9), (10) for pk + t S j S n. and, 

20 immediately after the pivot choosing section 2 of a u determines the pivot (11), calculates (12) and (13) for 
pk + t + 1 £ j ^ n, and the transmitter transmits the results to the memory of every other node through the 
gateway, while the updating section B' of the node in charge of the i-th row calculates (30) for every i such 
that (p + 1)k + 1 £ i £ n. This series of operations is below called parallel preprocessing At, where 2 £ t £ 
k. 

25 The updating section B' of each node in charge of the i-th row such that 1 S i £ pk or (p + 1)k + 1 £ i 
S n if n is a multiple of k or p < [n/k] and 1 3 i £ [n/k]k otherwise also calculates (14) through (18) for (p + 
1)k + 1 S j £ n if n is a multiple of k or p < [n/k] and for [n/k]k + 1 £ j £ n otherwise, retaining the values 
of 

30 

Regl 0) Reg}* 

in the register set. These operations are below called parallel updating B\ 

35 The postprocessing section C t of the above node a u calculate (21), (22) (29) for pk + t + 2 £ j £ 

n for t = 1, 2 k - 1 if n is a multiple of k or p < [n/k] and for t = 1, 2 n - [n/k]k otherwise. This 

series of operations is below called post-elimination C. 

According to a further aspect of the present invention there is provided a main controller J p that is 
connected to the system of nodes by the network, distributes the rows of the coefficient matrix A (0> and the 

40 components of b (0) and x to the coefficient matrix A (0) and the components of b (0) and x to the nodes in such 
a manner as each block of consecutive k rows and corresponding 2k components is transmitted to the 

memory of one node in the cyclic order of a 0t . . . , a P . 1( ao.ai and, if n is a multiple of k, instructs 

each node to execute parallel preprocessing Ai through A*, parallel updating B' and post-elimination C for p 
= 0 n/k - 1, and, if n is not a multiple of k, instructs each node to execute parallel preprocessing Ai 

45 through A*, parallel updating B' and post-elimination C for p = 0, 1 [n/k] - 1 and to execute parallel 

preprocessing Ai through A^^k, parallel updating B\ and post-elimination C for p = [n/k]. 

According to another aspect of the present invention there is provided an element processor compris- 
ing: 

a pivot choosing section that, for coefficient matrices A (r) , known vectors b (r) and an unknown vector x 
so expressed by (1) for a given system of linear equations (2), chooses a pivot in the i-th row of A (M> and 
interchanges the i-th column with the chosen pivotal column, 

a preprocessing section Ai that is connected to the pivot choosing section and calculates (4) for pk + 
2 £ j £ n and (5), 

k - 1 preprocessing sections At, where t = 2, 3 k, each of which is connected to the pivot 

55 choosing section, calculates (6), (7) (10) for pk + t £ j 5 n, and calculates (12) and (13) for pk + t + 

1 S j 5 n. 

an updating section B which is connected to the pivot choosing section, comprises a set of k registers 
and an arithmetic unit, and calculates (14), (15) (18) for (p + 1)k + 1 S j S n retaining the values of 
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Reg± c) , .... Regj k) 

5 in the register set. 

a back-substitution section that is connected to the pivot choosing section and obtains the unknown x 
by back-substitution, that is. by calculating (19) and (20). and 

a gateway that is connected to the pivot choosing section and is a junction with the outside. 

According to a further aspect of the present invention there is provided a system of clusters. CU 

ro CUm, each of which is connected to each other by a network and comprises: 
above element processors PEi ..... 



75 



45 



PE p , 



a memory that stores blocks of k rows of each coefficient matrix A (r) and corresponding k components 
of each known vector b (r) and the unknown vector x, 

a C gateway that is a junction with the outside, and 
20 a transmitter that transmits data between the memory and the outside through the C gateway. 

If the (pk + 1)th through (p + 1)k-th rows of A (0) and corresponding components of b (0> and x are 
assigned to the cluster CLu. then the pivot choosing section, the updating section and the back-substitution 
section of each element processor of CLu take charge of part of the k rows and 2k components row by row, 
while the preprocessing section At of each element processor of CLu takes charge of elements of the (pk + 
25 t)th row of A (r) and the (pk + t)th component of b w one by one. 

Specifically, the pivot choosing section of the element processor PEi of CU determines the transposed 
pivot (3) of the (pk + 1)th row, and the preprocessing sections Ai of element processors of CLu 
simultaneously calculate (4) and (5) for pk + 2 S j S n and (5) with each Ai calculating for elements and 
components in its charge, and the transmitter transmits the results to the memory of every other cluster 
30 through the C gateway, while the updating section B of the element processor in charge of the i-th row 
calculates (14) for every i such that (p + 1)k + 1 S i £ n. This series of operations is below called parallel 
preprocessing CLAi . 

The preprocessing sections A, of the above cluster CU simultaneously calculate (6), (7), (8). (9). (10) for 
pk + t £ j £ n with each At calculating for elements and components in its charge, immediately after the 
35 pivot choosing section of PE, of Cl_u determines the pivot (11). simultaneously calculate (12) and (13) for pk 
+ t + 1 £ j S n f and the transmitter transmits the results to the memory of every other cluster through the 
C gateway, while the updating section B of the element processor in charge of the i-th row calculates (30) 
for every i such that (p + 1)k + 1 £ i £ n. This series of operations is below called parallel preprocessing 
CLAt, where 2 £ t £ k. 

40 The updating sections B of each element processor in charge of the i-th row such that (p + 1)k + 1 £ i 
£ n calculate (14) through (18) for (p + 1)k + 1 S j S n retaining the values of 



Reg} 0) Reg} k) 



in the register set. These operation are below called parallel updating B c . 

According to a further aspect of the present invention there is provided a main controller Gpc that is 
connected to the above system, distributes and assigns the rows of the coefficient matrix A (0> and the 

so components of b (0) and x to the clusters in such a manner as each block of consecutive k rows and 
corresponding 2k components is transmitted to the memory of one cluster in the cyclic order of CU, 

CLp-i. CU, CLi and, if n is a multiple of k. instructs each cluster to execute parallel preprocessing 

CLAi through CIA and parallel updating B c for p = 0, 1 , .... n/k - 2 and to execute CLAi through CLAjt 
for p = n/k - 1, and, if n is not a multiple of k, instructs each cluster to execute CLAi through CLAt and B c 

55 for p = 0, 1 [n/k] - 1 and to execute CLAi through CLAwn/kjk for p = [n/k], and instructs each cluster 

to obtain the unknown vector x by means of the back-substitution sections of its element processors and its 
transmitter. 

According to another aspect of the present invention there is provided an element processor compris- 
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ing: 

a pivot choosing section that, for coefficient matrices A (r> , known vectors b (r> and an unknown vector x 
expressed by (1) for a given system of linear equations (2), chooses a pivot in the i-th row of A 0 * 1 * and 
interchanges the i-th column with the chosen pivotal column, 
5 a preprocessing section Ai that is connected to the pivot choosing section and calculates (4) for pk + 
2 £ j £ n and (5), 

k - 1 preprocessing sections At, where t = 2, 3 k, each of which is connected to the pivot 

choosing section, calculates (6), (7), ... t (10) for pk + t & j S n, and calculates (12) and (13) for pk + t + 
1*j*n, 

w an updating section B* which is connected to the pivot choosing section, comprises a set of k registers 

and an arithmetic unit, and calculates (14), (15) (18) for (p + 1)k + 1 S j £ n retaining the values 

of 



« Regr , . . . , Regr 

in the register set, 

k - 1 postprocessing sections Ct, where t = 1,2 k - 1, each of which is connected to the pivot 

20 choosing section and calculates (21), (22). . . , (29) for pk + t + 2 £ j £ n, and 

a gateway that is connected to the pivot choosing section and is a junction with the outside. 

According to a further aspect of the present invention there is provided a system of clusters, CU 

CLp.i, each of which is connected to each other by a network and comprises: 
above element processors PEi 



PE p , 

30 a memory that stores the coefficient matrices A (r) , the known vectors b <r) and the unknown vector x, 
a C gateway that is a junction with the outside, and 

a transmitter that transmits data between the memory and the outside through the C gateway. 

If the (pk + 1)th through (p + 1)k-th rows of A (0) and corresponding components of b <0) and x are 
assigned to the cluster CU. then the pivot choosing section and the updating section B' of each element 
35 processor of CU take charge of part of the k rows and 2k components row by row, while the preprocessing 
section A, and postprocessing section C t of each element processor of CU take charge of elements of the 
(pk + t)th row of A (r) and the (pk + t)th component of b (,) one by one. 

Specifically, the pivot choosing section of the element processor PEi of CU determines the transposed 
pivot (3) of the (pk + 1)th row, and the preprocessing sections Ai of element processors of CU 
40 simultaneously calculate (4) and (5) for pk + 2 £ j £ n with each Ai calculating for elements and 
components in its charge, and the transmitter transmits the results to the memory of every other cluster 
through the C gateway, while the updating section B' of the element processor in charge of the i-th row 
calculates (14) for every i such that (p + 1)k + 1 £ i S n. This series of operations is below called parallel 
preprocessing CLA? . 

45 The preprocessing sections A, of element processors of the above cluster CU simultaneously calculate 
(6), (7), (8), (9), (10) for pk + t £ j £ n with each A, calculating for elements and components in its charge 
and, immediately after the pivot choosing section of PE t of CU determines the pivot (11), simultaneously 
calculate (12) and (13) for pk + t + 1 £ j £ n, and the transmitter transmits the results to the memory of 
every other cluster through the C gateway, while the updating section B' of the element processor in charge 

so of the i-th row calculates (30) for every i such that (p + 1)k + 1 £ i £ n. This series of operations is below 
called parallel preprocessing CLA,, where 2 £ t £ k. 

The updating section B' of each element processor in charge of the i-th row such that 1 £ i £ pk or (p 
+ 1)k + 1 £ i £ n if n is a multiple of k or p < [n/k] and 1 £ i £ [n/k]k otherwise also calculates (14) through 
(18) for (p + 1)k + 1 £ j £ n if n is a multiple of k or p < [n/k] and for [n/k]k + 1 £ j < n otherwise, 

55 retaining the values of 
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30 



35 



40 



45 



50 



5 in the register set. These operations are below called parallel updating B' c . 

The postprocessing sections C, of element processors of the above CLu simultaneously calculate (21), 

(22) (29) for j such that pk + t + 2 £ j S n for t = 1, 2, .... k - 1 if n is a multiple of k or p < [n/k] 

and for t = 1,2 n - [n/k]k otherwise. This series of operations is below called postelimination C c . 

According to a further aspect of the present invention there is provided a main controller Jpc that is 

ro connected to the above system, distributes and assigns the rows of the coefficient matrix A (0> and the 
components of b (0) and x to the clusters in such a manner as each block of consecutive k rows and 

corresponding 2k components is transmitted to the memory of one cluster in the cyclic order of CLo 

CLp.i, CLoi CLi, . . . , and, if n is a multiple of k, instructs each cluster to execute parallel preprocessing 
CLAi through CLA, paralleLupdating B'c and parallel postelimination C c for p = 0, 1 n/k - 1, and if n 

Y5 is not a multiple of k, instructs each cluster to execute parallel preprocessing CLAi through CLA*. parallel 

updating B' c , and post-elimination C c for p = 0, 1 [n/k] - 1 and to execute parallel preprocessing 

CLAi through CLAn-in/ion. parallel updating B* Cl and postelimination C c for p = [n/k]. 

According to another aspect of the present invention, there is provided a parallel elimination method for 
solving the system of linear equations (2) in a parallel computer comprising C clusters CU CLc 

20 connected by a network. Each of the clusters comprises P c element processors and a shared memory that 
stores part of the reduced matrices A (r) and the known vectors b <r) and the unknown vector x. The method 
comprises: 

a data distribution means that distributes the rows of the coefficient matrix A (0) and the components of 
b (0) and x to the shared memory of the clusters in such a manner as each block of consecutive k rows and 

25 corresponding 2k components is transmitted to the shared memory in the cyclic order of CU CLc, 

CLi, CL2 and assigns those distributed to the cluster's shared memory to its element processors row 

by row, 

a pivot choosing means that chooses a pivot in a row assigned to each element processor, 
an elementary pre-elimination means that, after the pivot choosing means chooses the pivot 



55 



calculates 

a JfcP c *lJ a kP e +lj I a kP c *l kP e +l * I J d J 



. (JtP c *l) K (*P C ) , Q (kP c ) f . 



in the element processor in charge of the (kP c + 1)th row, transmits the results to the shared memory of 
every other cluster to which the element processor in charge of an i-throw such that kP c + 1 S i £ n 
belongs, and, for I - 2, ... , P c . calculates 



for kP c + I £ i £ n in the element processor in charge of the i-th row, calculates 
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V-2 



(35) 



i-i . 



/0 



(36) 



in the element processor in charge of the (kP c + l)th row, and, after the pivot choosing means determines 
the pivot 

75 

a kP c +lk? e +2' l J/ ' 

20 

calculates 



(JcP c *l) (Jtivi-i> - (JtP^J-D f . 



in the element processor in charge of the (kP c + l)th row. transmits the results (38) and (39) to the shared 
memory of every other cluster to which the element processor in charge of an i-th row such that kP c + I + 
35 1 S i S n belongs, 

a multi-pivot elimination means that calculates 

P c ) _ ^ lkP € ) {kPJ (kP c +l) £ (l >-i) ikP c +nO 

a ij ~ a ii «i JtP^l «JfeP e *li 2^ Ci a kP c +mj ' (40) 

40 B-2 . 



JD-2 



in each element processor in charge of the i-th row such that (k + 1)P C + 1 £ i S n, 
so a means for testing rf the operation of the multi-pivot elimination means was repeated [n/P c ] times, and 
a remainder elimination means that executes the above elementary pre-elimination means for the (- 
[n/P c ]P q + 1)th row through the n-th row, if the above testing means judges that the operation of the multi- 
pivot elimination means was executed [n/P c ] times, and n is not a multiple of P c . 

According to a further aspect of the present invention, there is provided a parallel computation method 
55 comprising: 

an elementary back-substitution means that calculates 
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5 



*i 8 bl B) (42) 



in the element processor in charge of the i-th row after the elimination process of the above parallel 
elimination method, 

an elementary back-transmission means that transmits X| to the shared memory of every cluster to 
which the element processor in charge of an h-th row such that 1 & h 3 i - 1 belongs, 
10 an elementary back-calculation means that calculates 

(43) 

T5 

for 1 £ h S i - 1 in the element processor in charge of the h-th row, and 

a means for testing if the operation of the elementary back-substitution means was repeated from i = n 
to I = 1. 

20 The solution of the system of linear equation (1) is thus obtained by the elementary back-substitution as 
Jfe = , . . . , jq = hi* (44) 

25 

in this order. 

According to another aspect of the present invention, there is provided a parallel elimination method for 
solving the system of linear equations (2) in a parallel computer comprising C clusters CLi , . . . , CLc 
connected by a network. Each of the clusters comprises P c element processors and a shared memory that 
30 stores part of the reduced matrices A <r) and the known vectors b <r) and the unknown vector x. The method 
comprises: 

a data distribution means that distributes the rows of the coefficient matrix A (0) and the components of 
b (0) and x to the clusters in such a manner as each block of consecutive k rows and corresponding 2k 

components is transmitted to the shared memory in the cyclic order of CLi CLc, CLi , CU and 

35 assigns those distributed to the cluster's shared memory to its element processors row by row, 

a pivot choosing means that chooses a pivot in a row assigned to each element processor, 

an elementary pre-elimi nation means that, after the pivot choosing means chooses the pivot (31), 
calculates (32) and (33) in the element processor in charge of the (P c k + 1)th row, transmits the results to 
the shared memory of every other cluster to which the element processor in charge of an i-th row such that 
40 kP c + 2 £ i £ n belongs, and, for 1 = 2, . . . , P C( calculates (34) for kP c + I £ i 3 n in the element 
processor in charge of the i-th row, calculates (35) and (36) in the element processor in charge of the (kP c 
+ l)th row, and, after the pivot choosing means chooses the pivot (37), calculates (38) and (39) in the 
element processor in charge of the (kP c + l)th row, and transmits the results (38) and (39) to the shared 
memory of every other cluster to which an element processor in charge of the i-th row such that kP c + I + 
45 1 £ i £ n belongs, calculates, 

a multi-pivot elimination means that calculates (43) and (44) in each element processor in charge of the 
i-throw such that (k + 1)P C + 1 S i £ n, 

an elementary post-elimination means that calculates 

50 

air 1 ' - «ff - < (45) 

b}** = Hf» - ahl 2 b& 1} (46) 

in the element processor in charge of the i-th row, 
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a post-elimination processing means that calculates (45) and (46) for I = - w + q + 1 for w ■ 1, . . . q 
and q = 1 P c - 1 for kP c + 1 £ i £ kP c + q in the element processor in charge of the i-th row, 

a means for testing if the operation of the post-elimination means was executed [n/P c ] times, and 

a remainder elimination means that executes the above elementary pre-elimination means for the (- 
5 [n/P c ]P c * l)th through the n-th rows and executes the above multi-pivot elimination means and the post- 
elimination means, if the above testing means judges that the operation of the post-elimination means was 
executed [n/P c ] times. 

According to a further aspect of the present invention, there is provided 

a search means whereby an above element processor searches for a nonzero element in the order of 
ro increasing column numbers from that diagonal element in the same row, if a diagonal element of a 
coefficient matrix is 0, 

a column number broadcasting means that notifies other element processors of the column number of a 
nonzero element found by the above search means, 

an element interchange means whereby each element processor interchanges the two elements which 
is are in its charge and have the same column numbers as the above diagonal zero element and the found 
nonzero element, and 

a component interchange means whereby two element processors interchange the two components of 
the unknown vector which are in their charge and have the same component indices as the column 
numbers of the above diagonal zero element and the found nonzero element. 
20 According to a further aspect of the present invention, there is provided 

a search means whereby an above element processor searches for an element with the greatest 
absolute value in the order of increasing column numbers from a diagonal element in the same row, 

a column number broadcasting means that notifies other element processors of the column number of 
an element found by the above search means, 
25 an element interchange means whereby each element processor interchanges the two elements which 
are in its charge and have the same column number as the above diagonal element and the found element, 
and 

a component interchange means whereby two element processors interchange the two components of 
the unknown vector which are in their charge and have the same component indices as the column 
30 numbers of the above diagonal element and the found component. 

BRIEF DESCRIPTION OF THE DRAWINGS 

These and other objects and features of the present invention will become clear from the following 
35 description taken in conjunction with the preferred embodiments thereof with reference to the accompany- 
ing drawings throughout which like parts are designated by like reference numerals, and in which: 

Fig. 1 is a block diagram of a linear calculating equipment according to the first embodiment of the 
present invention. 

Fig. 2 is a flow chart of a control algorithm to be performed in the first embodiment. 
40. Fig. 3 is a block diagram of a linear calculating equipment according to the second embodiment of the 
present invention. 

Fig. 4 is a flow chart of the control algorithm to be performed in the second embodiment. 
Fig. 5 is a block diagram of a parallel linear calculating equipment according to the third embodiment of 
the present invention. 
45 Fig. 6 is a block diagram of a node shown in Fig. 5. 

Rg. 7 is a flow chart of the control algorithm to be performed in the third embodiment. 

Fig. 8 is a block diagram of a parallel linear calculating equipment according to the fourth embodiment of 

the present invention. 

Rg. 9 is a block diagram of a node shown in Rg. 8. 
50 Rg. 10 is a flow chart of the control algorithm to be performed in the fourth embodiment. 

Rg. 1 1 is a block diagram of a parallel linear calculating equipment according to the fifth embodiment of 
the present invention. 

Rg. 12 is a block diagram of a cluster shown in Rg. 11. 
Fig. 13 is a block diagram of an element processor shown in Rg. 12. 
55 Rg. 14 is a flow chart of the control algorithm to be performed in the fifth embodiment. 

Rg. 15 is a block diagram of a parallel linear calculating equipment according to the sixth embodiment of 
the present invention. 

Rg. 16 is a block diagram of a cluster shown in Rg. 15. 
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Fig. 17 is a block diagram of an element processor shown in Fig. 16. 
Fig. 18 is a flow chart of the control algorithm to be performed in the sixth embodiment. 
Fig. 19 is a block diagram of an element processor or processor module in a parallel computer which 
implements the 7th and 8th embodiments. 
5 Fig. 20 is a block diagram of a cluster used in the 7th and 8th embodiments. 

Fig. 21 is a block diagram of the parallel computation method according to the 7th embodiment. 
Fig. 22 is a block diagram of the parallel computation method according to the 8th embodiment. 
Fig. 23 is a diagram for showing the pivoting method according to the 7th and 8th embodiments. 

70 DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

The preferred embodiments according to the present invention will be described below with reference 
to the attached drawings. 

Fig. 1 is a block diagram of linear calculating equipment in the first embodiment of the present 
75 invention. In Fig. 1, 1 is a memory; 2 is a pivoting section connected to the memory 1; 3, 4, 5 are 
preprocessing sections Ai , At, Ak respectively, each connected to the memory 1 ; 6 is an updating section B 
connected to the memory 1; 7 is a back-substitution section connected to the memory 1; 8 is a main 
controller G; 101 is a register set composed of k registers; 102 is an arithmetic unit. 

Following is a description of the operation of each component of the first embodiment. 
20 The memory 1 is ordinary semiconductor memory and stores reduced coefficient matrices A (r) with 
zeroes generated from the first to the r-th column and corresponding known vectors b (r) and an unknown 
vector x expressed by (1) for a given system of linear equations (2). 

The pivoting section is connected to the memory 1, chooses a pivot in the i-th row following the 
instruction of the main controller G 8 when the first (i - 1) columns are already reduced, and interchanges 
25 the i-th column with the chosen pivotal column and the i-th component with the corresponding component 
of x. The choice of the pivot is based on a method called partial pivoting whereby an element with the 
largest absolute value in the i-th row is chosen as the pivot. The interchange can be direct data transfer or 
transposition of column numbers and component indices. 

Immediately after the pivoting section 2 determines the transposed pivot (3), the preprocessing section 
30 Ai 3 calculates (4) for pk + 2 £ j Z n and (5) following the instruction of the main controller G. Each 

preprocessing sections A, 4, where t = 2, 3 k, is connected to the memory 1, calculates (6), (7), (8), 

(9), (10) for pk + t £ j £ n, and, immediately after the pivoting section determines the transposed pivot (11), 
calculates (12) and (13) for pk + t + 1 £ j £ n following the instruction of the main controller G 8. 

The updating section B 6 is connected to the memory 1, comprises a register set 101 of k registers and 
35 an arithmetic unit 102, and calculates (14), (15), (16), (17), (18) for (p + 1)k + 1 £ i, j £ n in the arithmetic 
unit 102, retaining each value of 

Reg} 0) Reg} k) 

AO 

in the corresponding register of the register set 101 following the instruction of the main controller G 8. (14), 
(15), (16) are preliminary formulas, and (17) and (18) are formulas that determine updated components. 

The back-substitution section 7 is connected to the memory 1 and obtains the value of the unknown 
45 vector x by calculating (19) and (20) for 1 £ h £ i - 1 for i = n, n - 1 1 in this order of i. 

The operation of the main controller G 8 is described below with reference to Fig. 2, which shows a flow 
chart of its control algorithm. 

The first step tests if n is a multiple of k. If it is, then the next step initializes p as p = 0 and enters the 

loop of the left side. The t-th step within this loop where, t = 1 k. instructs the pivoting section 2 and 

so the preprocessing section A, 4 to execute their operations for the (pk + t)th row of the current reduced 
matrix A* 1 ** 1 * 1 '. The next step tests Hp = n/k - 1 . If it is, then the next step escapes the loop. If p < n/k • 1, 
then the next step instructs the updating section B 6 to execute its operation. The next step increments p 
by 1 and returns to the operations of the pivoting section 2 and the preprocessing section Ai 3. 

If n is not a multiple of k, then the next step initializes p as p = 0 and enters the loop of the right side. 
55 Within this loop, the operations are the same except the fact that the condition for escaping the loop is p = 
[n/k], and the position of the testing for escape is immediately after the operation of A^n/^. 

After escaping one of the loops the final step instructs the back-substitution section 7 to execute its 
operation and terminates the whole operation to obtain the unknown vector x. 
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Rg. 3 is a block diagram of linear calculating equipment in the second embodiment of the present 
invention. In Rg. 3. 1 is a memory, 2 is a pivoting section connected to the memory 1; 3, 4, 5 are 
preprocessing sections Ai , A,, A* respectively, each connected to the memory 1 ; 9 is an updating section 
B' connected to the memory 1; 10, 11, 12 are postprocessing sections Ci, Ct, Cn respectively, each 
s connected to the memory 1; 13 is a main controller J; 103 is a register set composed of k registers; 104 is 
an arithmetic unit for, 101 is an arithmetic unit. 

Following is a description of the operation of each component, which is different from one in the first 
embodiment. 

The updating section B' 9 is connected to the memory 1 and calculates (14), (15) (18) for 1 £ i £ 

10 pk, (p + 1)k + 1 £ i S n, (p + 1)k + 1 S j S n rt n is a multiple of k or p < [n/k] and for 1 £ i £ [n/k]k, [n/k]k 
+ 1 5 j £ n otherwise in the arithmetic unit 104, retaining each value of 

Reg} 0) , . . . , Reg} k) 

15 

in the corresponding register of the register set 103. 

The k - 1 postprocessing sections Ct 11, where t = 1,2 k - 1, are connected to the memory 1 

and calculate (21), (22) (29) for pk + t + 2 S j £ n. 

20 The operation of the main controller J 13 is described below with reference to Rg. 4, which shows a 
flow chart of its control algorithm. 

The first step tests if n is a multiple of k. If it is, then the next step initializes p as p = 0 and enters the 

left side loop. The t-th step within this loop, where t = 1 k, instructs the pivoting section 2 and the 

preprocessing section A, 4 to execute their operations for the (pk + t)th row of the current reduced matrix 

25 A (pk+H, .The next step instructs the updating section B' 9 to execute its operation. The following k - 1 steps 
instruct the postprocessing sections Ci 10 through C k .i 12 to execute their operations in this order. The 
next step tests if p = n/k - 1. If it is, then the next step escapes the loop and terminates operation. If p < 
n/k - 1 , then the next step increments p by 1 and returns to the operation of the pivoting section 2. 

If n is not a multiple of k, then the next step initializes p as p = 0 and enters the right side loop. Within 

30 this loop, the first n - [n/k]k + 1 steps are the same as those in the loop of the left side. After instructing the 
preprocessing section A^n/uik 4 to execute its operation, the step tests if p = [n/k]. If it is not, then the 
following steps order the operations of the pivoting section 2 and the preprocessing section An. (nA j k+1 4 
through the operations of the pivoting section 2 and the preprocessing section A* 5 followed by the 
operation of the updating section B'9 and then the operations of the postprocessing sections Ci 10 through 

35 On 12. Then the step increments p by 1 and returns to the operation of the pivoting section 2. If p = [n/k], 
then the following steps instruct the updating section B* 9 to execute its operation, instruct the postproces- 
sing sections Ci 10 through C^n/wii 11 to execute their operations, and terminates the whole process to 
obtain the unknown vector. 

Rg. 5 is a block diagram of parallel linear calculating equipment in the third embodiment of the present 

40 invention. In Rg. 5, 21 is a network; 22, 23, 24 are nodes ao, a u , ap.i mutually connected by the network 21 ; 
25 is a main controller G p connected to each node. Rg. 6 is a block diagram of a node in Rg. 5. In Fig. 6, 1 
is a memory; 2 is a pivoting section connected to the memory 1; 3, 4, 5 are preprocessing sections Ai , At, 
A* respectively, each connected to the memory 1 ; 6 is an updating section B connected to the memory 1 ; 7 
is a back-substitution section connected to the memory 1; 26 is a gateway that is a junction with the outside 

45 ; 27 is a transmitter that transmits data between the memory 1 and the outside through the gateway 26; 101 
is a register set composed of k registers; 102 is an arithmetic unit. 

Following is a description of the operation of each component of the third embodiment. 
If the (pk + 1)th through (p + 1)k-th rows of A {0) and corresponding components of b (0> and x are 
assigned to the node a u 23, then the pivoting section 2 of the node a u 23 determines the pivot (3), and the 

so preprocessing section of the node a u 23 calculates (4) and (5) for pk + 2 £ j £ n, and the transmitter 27 
transmits the results to the memory 1 of every other node through the gateway 26, while the updating 
section B 6 of the element processor in charge of the i-th row calculates (14) for every i such that (p + 1)k 
+" 1 S i & n. This series of operations is below called parallel preprocessing Ai . 

The preprocessing section A, 4 of the node a u 23 calculates (6). (7), (8), (9), (10) for pk + t £ j £ n, and. 

55 immediately after the pivoting section 2 of a u 23 determines the pivot (11), calculates (12) and (13) for pk + 
t + 1 S j S n, and the transmitter 27 transmits the results to the memory 1 of every other node through the 
gateway 26, while the updating section B 6 of the element processor in charge of the i-th row calculates 
(30) for every i such that (p + 1)k + 1 £ i S n. This series of parallel operations is below called parallel 
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preprocessing At. where 2 * t s k. 

The updating section B 6 of each node in charge of the i-th row such that (p + l)k + 1 S i £ n also 
calculates (14) through (18) for (p + 1)k + 1 S j S n retaining the values of 

Reg} 0) , . . . , Reg} k) 

in the register set. These operations are below called parallel updating B. 
10 The back-substitution sections 7 of nodes a u 23 calculate (19) and (20) using necessary data 

transmitted by the transmitters 27 of other nodes. These operations are called back-substitution. 

The operation of the main controller G p 25 is described below with reference to Fig. 7, which shows a 

flow chart of its control algorithm at the level of above definition. 

The first step distributes and assigns the rows of the coefficient matrix A* 0 * and the components of b^ 
T5 and x to the nodes ao 22,. . . , a u 23 a P ., 24 in such a manner as each block of k rows and 

corresponding 2k components (n - [n/k]k rows and 2(n - [n/k]k) components in the final distribution) are 

transmitted to the memory 1 of one node at a time in the cyclic order of ao a P .t, a 0 , ai 

The next step tests if n is a multiple of k. If it is, then the next step initializes p as p = 0 and enters the 

loop of the left side. The t-th step within this loop orders the execution of the parallel preprocessing At for 
20 the (pk + t)th row of the current reduced matrix A (pk+H, .The next step tests if p - n/k - 1. If it is, then the 

next step escapes the loop. If p < n/k - 1, then the next step orders the execution of the parallel updating B. 

The next step increments p by 1 and returns to the execution of the parallel preprocessing Ai . 

If n is not a multiple of k, then the next step initializes p as p = 0 and enters the loop of the right side. 

Within this loop, the operations are the same except the fact that the condition for escaping the loop is p = 
25 [n/k], and the position of the testing for escape is between the parallel preprocessing A^^u and A^^m. 
After escaping one of the loops the final step orders the execution of back-substitution and terminates 

the whole operation to obtain the unknown vector x. 

Fig. 8 is a block diagram of parallel linear calculating equipment in the fourth embodiment of the 

present invention. In Fig. 8, 31 is a network; 32, 33, 34 are nodes ao, a u , ap.j mutually connected by the 
30 network 31; 35 is a main controller J p connected to each node. Fig. 9 is a block diagram of a node in Fig. 8. 

In Fig. 9, 1 is a memory; 2 is a pivoting section connected to the memory 1 ; 3, 4, 5 are preprocessing 

sections Ai , At, A* respectively, each connected to the memory 1; 9 is an updating section B* connected to 

the memory 1; 10, 11, 12 are postprocessing sections Ci, C,, C k .i respectively, each connected to the 

memory 1; 26 is a gateway that is a junction with the outside ; 27 is a transmitter that transmits data 
35 between the memory 1 and the outside through the gateway 26; 103 is a register set composed of k 

registers; 104 is an arithmetic unit. 

Following is a description of the operation of each component of the fourth embodiment. 

If the (pk + 1)th through (p + i)k-th rows of A (0> and corresponding components of b <0) and x are 

assigned to the node a u 33, then the pivoting section 2 of the node a u 33 determines the pivot (3), and the 
40 preprocessing section of the node a u 33 calculates (4) and (5) for pk + 2 £ j £ n, and the transmitter 27 

transmits the results to the memory 1 of every other node through the gateway 26, while the updating 

section B 6 of the element processor in charge of the i-th row calculates (14) for every i such that (p + 1)k 

+ 1 £ i £ n. This series of operations is below called parallel preprocessing Ai . 

The preprocessing section A, 4 of the node a u 23 calculates (6), (7), (8), (9), (10) for pk + t £ j £ n, and, 
45 immediately after the pivoting section 2 of a u 23 determines the pivot (11), calculates (12) and (13) for pk + 

t + 1 £ j £ n, and the transmitter 27 transmits the results to the memory 1 of every other node through the 

gateway 26, while the updating section B* 9 of the element processor in charge of the i-th row calculates 

(30) for every i such that (p + l)k + 1 S i £ n. This series of operations is below called parallel 

preprocessing At, where 2 S t S k. 
so The updating section B' 9 of each node in charge of the i-th row such that 1£i£pkor(p + 1)k + 1£ 

i £ n if n is a multiple of k or p < [n/k] and 1 £ i £ [n/k]k otherwise also calculates (14) through (18) for (p + 

l)k + 1 £ j £ n if n is a multiple of K or p < [n/k] and for [n/k]k + 1 £ j S n otherwise, retaining the values 

of 

55 

Reg} 0) , . . . , Reg} k) 
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in the register set. These operations are below called parallel updating B*. 

The postprocessing section Ci 1 1 of the above node a u 33 calculate (21), (22) (29) for pk + t + 2 

£ j £ n for t = 1, 2, .... k - 1 if n is a multiple of k or p < [n/k] and for t = 1, 2, .... n - [n/k]k otherwise. 
This series of operations is below called post-elimination C. 
5 The operation of the main controller J p 35 is described below with reference to Fig. 10, which shows a 
flow chart of its control algorithm at the level of above definition. 

The first step distributes and assigns the rows of the coefficient matrix A® ] and the components of b* 0 * 

and x to the nodes a 0 32, . . . , o u 33 a P . } 34 in such a manner as each block of k rows and 

corresponding 2k components (n - [n/k]k rows and 2(n - [n/k]k) components in the final distribution) are 

ro transmitted to the memory 1 of one node at a time in the cyclic order of a 0 a P . 1t a 0 . ai 

The next step tests if n is a multiple of k. If it is. then the next step initializes p as p = 0 and enters the 
loop of the left side. The t~th step within this loop orders the execution of the parallel preprocessing A, for 
the (pk + t)th row of the current reduced matrix A^' ^.The next step orders the execution of the parallel 
updating B\ The next step orders the execution of the post-elimination C. The next step tests Hp = n/k - 1 . 
75 If it is, then the next step escapes the loop. If p < n/k - 1, then the next step increments p by 1 and returns 
to the execution of the parallel preprocessing Ai . 

If n is not a multiple of k, then the next step initializes p as p = 0 and enters the loop of the right side. 
Within this loop, the operations are the same except the fact that the condition for escaping the loop is p = 
[n/k], and if p = [n/k], the steps skip the order for the execution of the parallel preprocessing A 0 . [nrt(]k+1 
20 through A*. 

By the above processing, the unknown vector is obtained. 

Fig. 1 1 is a block diagram of a parallel linear calculating equipment according to the fifth embodiment of 
the present invention. In Fig. 11, 41 is a network; 42, 43, 44 are clusters CU, CU. CLp.i mutually connected 
by the network 41; 45 is a main controller Gpc connected to each cluster. Fig. 12 is a block diagram of a 
25 cluster in Fig. 11 . In Fig. 12, 1 is a memory; 46 is a C gateway that is a junction with the outside; 47, 48. 49 
are element processors PEi , PE2, 



each connected to the memory 1 ; 50 is a transmitter that transmits data between the memory 1 and the 
outside through the C gateway 46. Fig. 13 is a block diagram of an element processor in Fig. 12. In Fig. 13, 
2 is a pivoting section; 3, 4, 5 are preprocessing sections Ai, At, An respectively, each connected to the 

35 pivoting section 2; 6 is an updating section B connected to the pivoting section 2; 7 is a back-substitution 
section connected to the pivoting section 2; 51 is a gateway that is a junction with the outside ; 101 is a 
register set composed of k registers; 102 is an arithmetic unit. 

Following is a description of the operation of each component of the fifth embodiment. 

If the (pk + 1)th through (p + 1)k-th rows of A (0) and corresponding components of b <0> and x are 

40 assigned to the cluster CU 43, then the pivoting section 2, the updating section 6 and the back-substitution 
section 7 of each element processor of CU 43 take charge of part of the k rows and 2k components row by 
row, while the preprocessing section At 4 of each element processor of CU 43 takes charge of elements of 
the (pk + t)th row of A (, > and the (pk + t)th component of b (r) one by one. 

Specifically, the pivoting section 2 of the element processor PE1 of CU 43 determines the transposed 

45 pivot (3) of the (pk + 1)th row, and the preprocessing sections Ai 3 of element processors of CU 
simultaneously calculate (4) and (5) for pk + 2 £ j S n with each Ai 3 calculating for elements and 
components in its charge, and the transmitter 50 transmits the results to the memory of every other cluster 
through the C gateway 46, while the updating section B 6 of the element processor in charge of the i-th row 
calculates (14) for every i such that (p + 1)k + 1 £ i S n. This series of operations is below called parallel 

so preprocessing CLAi . 

The preprocessing sections A, 4 of the above cluster CU 43 simultaneously calculate (6), (7), (8). (9), 
(10) for pk + t 5 j £ n with each A, 4 calculating for elements and components in its charge and, 
immediately after the pivoting section of PE, of CU 43 determines the pivot (11), simultaneously calculate 
(12) and (13) for pk + t + 1 £ j £ n, and the transmitter 50 transmits the results to the memory 1 of every 
55 other cluster through the C gateway 46, while the updating section B 6 of the element processor in charge 
of the i-th row calculates (30) for every i such that (p + 1)k + 1 £ i £ n. This series of operations is below 
called parallel preprocessing CLA,, where 2 s t £ k. 

The updating sections B 6 of each element processor in charge of the i-th row such that (p + l)k + 1 
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S i * n calculate (14) through (18) for (p + 1)k + 1 S j S n retaining the values of 

Regl 0) Reg} k) 

5 

in the register set 101. These operations are below called parallel updating B c . 

The back-substitution sections 7 of element processors calculate (19) and (20) using necessary data 
transmitted by the transmitters 50 of other clusters. These operations are called back-substitution. 

10 The operation of the main controller Gpc 45 is described below with reference to Fig. 14, which shows a 
flow chart of its control algorithm at the level of above definition. 

The first step distributes and assigns the rows of the coefficient matrix A (0> and the components of b (0> 

and x to the cluster CLo 42,. ... CU, 43 CUm 44 in such a manner as each block of k rows and 

corresponding 2k components (n - [n/k]k rows and 2(n - [n/k]k) components in the final distribution) are 

75 transmitted to the memory 1 of one node at a time in the cyclic order of CLo, .... CLp. 1t CU, CU 

The next step tests if n is a multiple of k. If it is, then the next step initializes p as p = 0 and enters the 
loop of the left side. The t-th step within this loop orders the execution of the parallel preprocessing CLA, 
for the (pk + t)th row of the current reduced matrix A^^.The next step tests if p = n/k - 1. If it is, then 
the next step escapes the loop. If p < n/k - 1, then the next step orders the execution of the parallel 

20 updating B c . The next step increments p by 1 and returns to the execution of the parallel preprocessing 
CLAi. 

If n is not a multiple of k, then the next step initializes p as p = 0 and enters the loop of the right side. 
Within this loop, the operations are the same except the fact that the condition for escaping the loop is p = 
[n/k], and the position of the testing for escape is between the parallel preprocessing CLAi-cn/kjk and 

25 CLAn.jn/kjk+r. 

After escaping one of the loops the final step orders the execution of back-substitution and terminates 
the whole operation to obtain the unknown vector x. 

Fig. 15 is a block diagram of a parallel linear calculating equipment according to the sixth embodiment 
of the present invention In Fig. 15 f 61 is a network; 62, 63, 64 are clusters CU, Cl_u, CLp.i mutually 
30 connected by the network 61; 65 is a main controller Jpc connected to each cluster. Fig. 16 is a block 
diagram of a cluster in Fig. 15. In Fig. 16, 1 is a memory; 46 is a C gateway that is a junction with the 
outside; 66, 67, 68 are element processors PEi , PE 2 , 

35 PE p 9 

c 

each connected to the memory 1 ; 50 is a transmitter that transmits data between the memory 1 and the 
outside through the C gateway 46. Fig. 17 is a block diagram of an element processor shown in Fig. 16. In 

40 Fig. 17, 2 is a pivoting section; 3, 4, 5 are preprocessing sections Ai , At, An respectively, each connected to 
the pivoting section 2; 9 is an updating section B' connected to the pivoting section 2; 10, 11, 12 are 
postprocessing sections Ci, C,, Cm respectively, each connected to the pivoting section 2; 51 is a gateway 
that is a junction with the outside; 103 is a register set composed of k registers; 104 is an arithmetic unit 
Following is a description of the operation of each component of the fourth embodiment. 

45 If the (pk + 1)th through (p + 1)k-th rows of A (0) and corresponding components of b (0) and x are 
assigned to the cluster CLu 63, then the pivoting section 2 and the updating section B' 9 of each element 
processor of CU, 63 take charge of part of the k rows and 2k components row by row, while the 
preprocessing section At 4 and postprocessing section C, 11 of each element processor of CLu 63 take 
charge of elements of the (pk + t)th row of A (r) and the (pk + t)th component of b (r) one by one. 

so Specifically, the pivoting section 2 of the element processor PEi of CLu 63 determines the transposed 
pivot (3) of the (pk + l)th row, and the preprocessing sections Ai 3 of element processors of CU 63 
simultaneously calculate (4) and (5) for pk + 2 S j S n with each Ai 3 calculating for elements and 
components in its charge, and the transmitter 50 transmits the results to the memory 1 of every other 
cluster through the C gateway 46, while the updating section B' 9 of the element processor in charge of the 

55 i-th row calculates (14) for every i such that (p + l)k + 1 £ i 2 n. This series of operations is below called 
parallel preprocessing CLAi . 

The preprocessing sections A, 4 of the above cluster CLu 63 simultaneously calculate (6), (7), (8), (9), 
(10) for pk + t £ j £ n with each At 4 calculating for elements and components in its charge and, 
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immediately after the pivoting section 2 of the element processor PE, of CU 63 determines the pivot (11), 
simultaneously calculate (12) and (13) for pk + t + 1 £ j £ n, and the transmitter 50 transmits the results to 
the memory 1 of every other cluster through the C gateway 46, while the updating section B* 9 of the 
element processor in charge of the i-th row calculates (30) for every i such that (p + 1)k + 1 £ i £ n. This 
5 series of operations is below called parallel preprocessing &A, where 2 £ t * k. 

The updating section B* 9 of each element processor in charge of the i-th row such that 1 £ i £ pk or (p 
+ 1)k + 1 £ i S n if n is a multiple of k or p < [n/k] and 1 & i £ [n/k]k otherwise also calculates (14) through 
(18) for (p + 1)k + 1 S j S n if n is a multiple of k or p < [n/k] and for [n/k]k + 1 £ j £ n otherwise, retaining 
the values of 



Regl 0) Regj" 

15 in the register set. These operations are below called parallel updating B' c . 

The postprocessing sections C t 1 1 of element processors of the above CU 63 simultaneously calculate 

(21), (22) (29) for j such that pk + t + 2 & j & n for t = 1, 2, .... k - 1 if n is a multiple of k or p < 

[n/k] and for t = 1, 2, .... n - [n/k]k otherwise with each Ct 1 1 calculating for elements and components in 
its charge. This series of operations is below called post-elimination C c . 
20 The operation of the main controller Jpc 65 is described below with reference to Fig. 18, which shows a 
flow chart of its control algorithm at the level of above definition. 

the first step distributes and assigns the rows of the coefficient matrix A (0) and the components of b m 

and x to the clusters CU 62 CLu 63 CU-i 64 in such a manner as each block of k rows and 

corresponding 2k components (n - [n/k]k rows and 2(n - [n/k]k) components in the final distribution) are 

25 transmitted to the memory 1 of one node at a time in the cyclic order of CU, . • • , CLp-i. CU, CU 

The next step tests if n is a multiple of k. If it is, then the next step initializes p as p = 0 and enters the 
loop of the left side. The t-th step within this loop orders the execution of the parallel preprocessing CIA 
for the (pk + t)th row of the current reduced matrix A <pk+M ).The next step orders the execution of the 
parallel updating B' c . The next step orders the execution of the post-elimination C c . The next step tests if p 
30 = n/k - 1 . If it is, then the next step escapes the loop. If p < n/k - 1 , then the next step increments p by 1 
and returns to the execution of the parallel preprocessing CLAi . 

If n is not a multiple of k, then the next step initializes p as p - 0 and enters the loop of the right side. 
Within this loop, the operations are the same except the fact that the condition for escaping the loop is p = 
[n/k], and if p = [n/k], the steps skip the order for the execution of the parallel preprocessing CLA>-[n/k]k+i 
35 through CLA*. 

By the above processing, the unknown vector is obtained. 

Fig. 19 shows a block diagram of an element processor or processor module of a parallel computer that 
implements the seventh embodiment of the present invention. In Fig 19, 201 is a gate way; 202 is a cache 
memory; 203 is a central processing unit; 204 is a local memory; 205 is a shared buss. Fig 20 shows a 

40 block diagram of a cluster composed of element processors 212, 213 214, a C gateway 210, and a 

shared memory 21 1 . A network of the parallel computer connects each of the clusters to each other, so that 
data can be transmitted between any two clusters. Let the number of element processors in each cluster be 
P c and the total number of clusters be C. Then the total number P of element processors in the parallel 
computer is C*P C . Furthermore, let the clusters be denoted by CLi, CU. ■ • • , CLc, and let the element 

45 processors of CU be denoted by PF^ y 

PR u P c * 

50 

Fig. 21 is a block diagram of parallel linear computation method according to the seventh embodiment 
of the present invention implemented by a parallel computer structured above. In Fig. 21, 220 is a data 
distribution means; 221 is a pivoting means; 222 is an elementary pre-elimination means; 223 is a multi- 
pivot elimination means; 224 is an elimination testing means; 225 is a remainder elimination means; 226 is 
55 an elementary back-substitution mean; 227 is an elementary back-transmission means; 228 is an elemen- 
tary back-calculation means; 229 is a back-processing testing means. 

The operation of the parallel linear computation method of the seventh embodiment is described below 
with reference to Fig. 21. 
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In the first step, the data distribution means 220 distributes each i-th row of A TO and i-th component of 
b^ and x to the cluster CU, such that u = p/P c ] - [[VP C VC]C + 1. Then the data distribution means 220 
assigns each i-th row of A (0) and i-th component of b^ distributed to the cluster CLu to the element 
processor PR« v such that v = i - [i/P c ]P c + 1. Then the data distribution means initializes k as k = 0. 

In the second step, the elimination testing means 224 tests if the multi-pivot elimination means repeated 
its operation [n/P c ] times, that is. whether k = [n/P cl . If it did. then the process jumps to the fifth step. If it 
did not. the process goes to the third step. 

In the third step, the elementary pre-elimination means 222 executes preliminary processing for the i-th 
rows of reduced matrices and the corresponding known vectors such that i = kP c + I and I = 1. . . . . P c in 
this order. The processing involves a pivot choosing process for each I. 

Methods of choosing a pivot are in general classified into either partial pivoting or full pivoting. Partial 
pivoting chooses as a pivot in each reduced matrix A <f> an element with the largest absolute value in the 
relevant column or row. Full pivoting chooses as a pivot in each reduced matrix A (r) an element with the 
largest absolute value in the submatrix of the columns or rows which have not hitherto been pivotal. 
Besides, if precision is not important so much, then choosing of a pivot is necessary only when the relevant 
diagonal element is 0, and in that case any nonzero element can be chosen as a pivot in partial pivoting. 
Pivoting methods in the present invention employ partial pivoting, and the present first method chooses the 
first nonzero element in the relevant row, and the present second method chooses an element with the 
greatest absolute value in the relevant row. 

Fig. 23 shows the process of the pivot choosing means 221. In Fig. 23, 240 is a search means; 241 is a 
column number broadcasting means; 242 is an element interchange means; 243 is an component 
interchange means. 

In the present first method of pivot choosing, the element processor in charge of each i-th row. by the 
search means 240. tests if i = 0. If it is not. then the process terminates. If it is, then the element 
processor, by the search means 240, searches for a nonzero element in the i-th row of A (M) from a (M, i i*i to 
a (M) i „ in this order. If a (M> i h is the first such element, then the element processor, by the broadcasting 
means 241, notifies each element processor of the column number h by a broadcast. Specifically, the 
element processor either transmits h to a specified word of the shared memory 211 of each cluster, and 
each element processor refers to the word, or the element processor transmits h to a dedicated bus line, 
and each element processor fetches h into its local memory 204. Then each element processor, by the 
element interchange means 242, simultaneously interchanges the element with the column number i with 
the element with the column number h in the row in its charge. Then two element processors in charge of 
the i-th component and the h-th component of the unknown vector x respectively interchange these 
component by the component interchange means 243. The pivot choosing process terminates hereby. 

In the present second method of pivot choosing, the element processor in charge of each i-th row, by 
the search means 240, sets Max = ja (M) j ij and Col = i. The element processor then compares max with 

j a fl-i>» i| for j = i + 1 n in this order and updates Max and Col as Max = \a^\ j| and Col = j. only if 

|a (M, i j| is greater than Max. Then the element processor notifies each element processor of Col by a 
broadcast. The remaining steps are the same as above. 

In the process of the elementary pre-elimination means 222, if I = 1, then the element processor PR^ t 
incharge of the (kP c + 1)th row in the cluster CU where u = k - [k/C] + 1. calculates (32) and (33), and 
transmits the results to the shared memory of every other cluster to which the element processor in charge 
of an i-th row such that kP c + 2 £ i £ n belongs. If 2 £ I £ P c , then each element processor in charge of the 
i-th row such that kP c + I £ i £ n calculates (34), and the element processor PR U , calculates (35) and (36). 
Then after the pivot choosing means determines the pivot (37), the element processor PRu ( calculates (38) 
and (39) and transmits the results to the shared memory of every other cluster to which the element 
processor in charge of an i-th row such that kP c + I + 1 £ i £ n belongs. 

In the fourth step, by the multi-pivot elimination means 223, each element processor in charge of the i- 
th row such that (k + 1)P C ♦ 1 £ i £ n calculate (40) and (41) for i. 

In the fifth step, by the remainder elimination means 225. each element processor in charge of the i-th 
row such that [n/P c ]P c + 1 £ i £ n executes the same operation as in the elementary pre-elimination means 
232 for I = 2 n - [n/P c ]P c . Then this step initializes i as i = n, and goes to the sixth step. 

In the sixth step, by the elementary back-substitution means 226, the element processor in charge of 
the i-th row calculates (42). 

In the seventh step, the back-processing testing means 229 tests if i = n. If it is, then the solution of 
the system of linear equation (2) has been obtained by the above elementary back-substitution as (44), and 
the process terminates. If it is not, then the process proceeds to the eighth step. 

In the eighth step, an elementary back-transmission means that transmits xj to the shared memory of 
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every clusters such that the element processor in charge of an h-th row such that 1 S h £ i - 1 belongs. 

In the ninth step, by the elementary back-calculation means, each element processor in charge of the h- 
th row such that 1 S h S i - 1 calculates (43). Then this step decrements i by 1, and increments goes to the 
sixth step. 

5 Fig. 22 is a block diagram of parallel linear calculating method in the eighth embodiment of the present 
invention implemented by a parallel computer structured as in the seventh embodiment. In Fig. 22, 220 is a 
data distribution means; 221 is a pivot choosing means; 231 is an elimination testing means, 232 is an 
elementary pre-elimination means; 233 is a multi-pivot elimination means; 234 is an elementary post- 
elimination means; 225 is a post-elimination processing means; 236 is a remainder elimination means. 

ro The operation of the parallel linear computation method of the seventh embodiment is described below 
with reference to Fig. 22. 

In the first step, the data distribution means 220 distributes each i-th row of A <0) and i-th component of 
b <0) and x to the cluster CU such that u = [i/P c ] - [[i/PJ/ClC + 1. Then the data distribution means 220 
assigns each i-th row of A (0) and i-th component of b (0) distributed to the cluster CU to the element 
is processor PF^ v such that v = i - [i/P c ]P c + 1 ■ Then the data distribution means initializes k as k = 0. 

In the second step, the elimination testing means 231 tests if the multi-pivot elimination means repeated 
its operation [n/PJ times, that is, whether k = [n/P cJ . If it did, then the process jumps to the sixth step. If it 
did not, the process goes to the third step. 

In the third step, the elementary pre-elimination means 232 executes preliminary processing for the i-th 

20 rows of the reduced matrices and the corresponding known vectors such that i - kP c + I and I = 1 

P c in this order. The processing involves a pivot choosing process for each I, which is the same as in the 
seventh embodiment. 

In the pre-elimination means 232, if I = 1 , then after the pivot choosing means 221 determines the pivot 
(31), the element processor PR^ i in charge of the (kP c + 1)th row in the cluster CLu, where u = k - [k/C] 

25 +1, calculates (32) and (33), and transmits the results to the shared memory of every other cluster. If 2 £ I 
£ P c , then each element processor in charge of the i-th row such that kP c + I £ i £ n calculates (34), and 
the element processor PRu i calculates (35) and (36). Then after the pivot choosing means determines the 
pivot (37), the element processor PFt,, , calculates (38) and (39) and transmits the results to the shared 
memory of every other cluster. 

30 In the fourth step, by the multi-pivot elimination means 233, each element processor in charge of the i- 
th row such that 1 S i 2 kP c or (k + 1)P C + 1 S i S n calculates (40) and (41). 

In the fifth step, the post-elimination processing means 235 eliminates unnecessary elements generated 
by the multi-pivot elimination means 233. The core of the post-elimination processing means 235 is the 
elementary post-elimination means 234, which calculates (45) and (46) in the element processor in charge 

35 of the i-th row. 

By the post-elimination processing means the element processor in charge of the (kP c + w)th row 

calculates (45) and (46), where i = P c + w and I = - w + q + 1, from w = 1 to w = q for q = 1, 2 

Pc-1. 

In the sixth step, by the remainder elimination means 236, each element processor in charge of the i-th 
40 row such that [n/P c ]P c + 1 £ i £ n executes the operation of the elementary pre-elimination means 232. 
Then the remainder elimination means executes operation of the multi-pivot elimination means 233 followed 
by the post-elimination processing means 235. The operation of pre-elimination processing means 232 

should be executed for I = 1 n - [n/P c ]P c . The operation of the multi-pivot elimination means 233 

should be executed by calculating (40) and (41) for 1 £ i £ [n/P c ]P c and k = [n/P c ]. The operation of the 
45 post-elimination processing means 235 should be executed from q = 1 to q = n - tn/P c ]P c for k = [n/P c J. 
The unknown vector x is obtained as the vector b (r> after the above operation. 

If the preprocessing section A, and the postprocessing section C, have their own register sets as the 
updating section B and B* in the first embodiment through the six embodiment, and their operations are 
executed by retaining values of variables and divisors, then the number of load-and-store operations for the 
so memory are reduced, and further improvement in computation speed can be achieved. 

In the seventh and eighth embodiments two components of the unknown vector should be interchanged 
if the corresponding columns are interchanged by the pivoting means. However, it is not necessary to 
actually transpose the components. By simply memorizing the correct position of the components after 
each interchange of columns, the correct solution is obtained by considering the positions in the final 
55 substitution to the components of the unknown vector. 

Thus the present invention provides high-speed linear calculating equipment and parallel linear 
calculating equipment for solving systems of linear equations by means of Gauss's elimination method and 
Gauss-Jordan's method based on multi-pivot simultaneous elimination and scalar operations. The speed-up 
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is achieved by reducing the number of load-and-store operations for the memory by retaining values of 
variables in register sets in updating processing, and reducing the number of iteration by multi-pivot 
simultaneous elimination. And the present invention is easily implementation in scalar computers. In fact, an 
experiment done in a scalar computer by means of software showed that Gauss's method and Gauss- 

5 Jordan's method based on 8-pivot simultaneous elimination was 2.5 times faster than original Gauss's 
elimination method and Gauss-Jordan's elimination method. 

As for the parallel calculating equipment of the third through sixth embodiments of the seventh and 
eighth embodiments, each memory is assigned blocks of k rows of the coefficient matrix A 0) for the k-pivot 
simultaneous elimination method, so that effects of parallel computation are enhanced. In the fifth and sixth 

to embodiments, where element processors are clustered, the preprocessing or both the preprocessing and 
the postprocessing are also made parallel, and the computation is more effective. In these embodiments, a 
theoretical estimation has shown that if the number of components of the unknown vector X is sufficiently 
large for a definite number of processors, then the effects of parallel computation are sufficiently powerful. 
Therefore, parallel linear calculating equipment effectively employing Gauss method and Gauss-Jordan 

75 method based on multi-pivot simultaneous elimination has been obtained. 

Furthermore, the present invention effectively makes possible high-speed parallel computation for 
solving systems of linear equations using a parallel computer with a number of element processors by 
means of the methods of the seventh and eighth embodiments. 

Although the present invention has been fully described in connection with the preferred embodiments 

20 thereof with reference to the accompanying drawings, it is to be noted that various changes and 
modifications are apparent to those skilled in the art. Such changes and modifications are to be understood 
as included within the scope of the present invention as defined by the appended claims unless they depart 
therefrom. 

25 Claims 

1. Linear calculating equipment comprising: 

a memory that stores a plurality of coefficient matrices A <r) and vectors b (r) and an unknown vector 
x expressed by 

30 
35 



40 

X = (Xi, X2, . . . , X n / 

where r = 0, 1 n, for a given system of linear equations 

45 A m X = tfi°> t 

a pivoting section connected to said memory for choosing a pivot in the i-th row of A^K and 
interchanges the i-th column with the chosen pivotal column for 1 £ i £ n, 

a preprocessing section Ai which, immediately after said pivot choosing section determines the 
50 transposed pivot 



55 

for a given positive integer k and a value of a nonnegative integral parameter p, calculates 



23 



EP 0 523 544 A2 

for pk + 2 £ j S n and 

a plurality of preprocessing sections A,, where t = 2, 3 k, each of which is connected to said 

memory and calculates 



t-2 



iD-l 



a^- 1 > <= a^i - E Reg£V . 

J8-1 



for pk + t S j £ n, and, immediately after said pivot choosing section determines the transposed pivot 



calculates 



for pk + t + 1 S j S n, 
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an updating section which is connected to said memory, comprises a register set of k registers and 
an arithmetic unit, and calculates 



0-1 



for (P + 1 )k + 1 S i, j S n retaining the values of 



Regl c) , . . . , J2e^i u) 



in said register set, 



a back-substitution section which is connected to said memory and obtains the value of said 
unknown vector x by calculating 



*i - hi* 



and 



for 1 £ h £ i • 1 for i = n, n - 1 1 in this order of i, and 

a main controller G which, if n is a multiple of said k, instructs said pivot choosing section, said 

preprocessing sections At A^ and said updating section to repeat their operations for p = 0, 1 , . 

. . , n/k - 2, and instructs said pivot choosing section and said preprocessing sections At , . . . , A* to 
execute their operations for p = n/k -1 , and, if n is not a multiple of said k, instructs said pivot choosing 
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section, said preprocessing sections Ai A*, and said updating section B to repeat their operations 

for p = 0, 1, . . . [n/k] - 1, where [s] denotes the greatest integer not exceeding s, and instructs said 

pivot choosing section and said preprocessing sections Ai A^^ to execute their operations, 

and in both cases, instructs said back substitution section to obtain said unknown vector. 

2. Linear calculating equipment comprising: 

a memory which stores a plurality of coefficient matrices A (r) and vectors b (r) and an unknown 
vector x expressed by 



10 



15 



30 



35 



40 



50 



55 



X * (Xi , X2, . . . , X„) f 

20 

where r = 0, 1 n, for a given system of linear equations 

A (0) x = tfo> f 

25 a pivot choosing section which is connected to said memory, chooses a pivot in the i-th row of 

A (l * 1) , and interchanges the i-th column with the chosen pivotal column for 1 S i £ n, 

a preprocessing section Ai which, immediately after said pivot choosing section determines the 
transposed pivot 



for a given positive integer k and a value of a nonnegative integral parameter p, calculates 
for pk + 2 £ j £ n and 



k - 1 preprocessing sections A,, where t = 2, 3, . . . , k, each of which is connected to said 
memory and calculates 
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JB-l 



c-1 



for pk + t £ j £ n, and, immediately after said pivot choosing section determines the transposed pivot 



a pk+cpk+t * 



calculates 



for pk + t + 1 £ j S n, 

an updating section which is connected to said memory, comprises a register set of k registers and 
an arithmetic unit, and calculates 
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Reg}* 



J»-l 



b iMH a b^-J^ Reg^bgZ* 



for 1 S i £ pk, (p + 1)k + 1 S i £ n. and (p + 1)k + 1 S j £ n if n is a multiple of k or p < [n/k] and for 

1 S i £ [n/k]k, [n/k]k + Mj^n otherwise, where [s] denotes the greatest integer not exceeding s, 
retaining the values 
of 



Reg} 0) *eg}» 



in said register set. 

k - 1 postprocessing sections C t , where t = 1,2 k - 1, each of which is connected to said 

memory and calculates 



Reg™ -^K&m. 



Reg™ -^SSUi. 



Keg* - £p**ep*+e»i' 
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10 



15 



20 



50 



55 



• • • # 



Jb^ t41> = JbJK* - Hag™ 2j^» , 



USE- 11 - itfE* - «eg<«> I^RE 11 



for pk + t + 2 £ j £ n, 

30 a main controller that, if n is a multiple of k, instructs said pivot choosing section, said preproces- 
sing sections Ai An, said updating section B\ and said postprocessing sections Ci C k .i to 

repeat their operations for p = 0. 1 n/k - 1, and. if n is not a multiple of k, instructs said pivot 

choosing section, said preprocessing sections Ai Ak, the updating section, and said postproces- 
sing sections Ci to repeat their operations for p = 0, 1 ... . [n/k] - 1 , and instructs said pivot 

35 choosing section, said preprocessing sections Ai A„.[r»/kjk. said updating section B\ and said 

postprocessing sections Ci , . . . , C^^ik to execute their operations for p = [k/nj. 

3. Linear calculating equipment comprising: 
a network, 

40 a main controller which is connected to said network and executes flow control, 

a plurality of nodes o u , where u = 0, 1 P - 1 , each of which is connected to each other by 

said network and comprises: 

a memory which stores blocks of k rows of each coefficient matrix A <r) and corresponding k 
components of each known vector b (r> and an unknown vector x assigned to said node and expressed 
45 by 



A (r) = la}?) . iti.j sn, 

X = (Xt, X2, . . . , x n y 
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for 0 S r S n, for a given system of linear equations 

a pivot choosing section which is connected to said memory, chooses a pivot in the i-th row of 
A (M) , and interchanges the i-th column with the chosen pivotal column for i in said node's charge, 

a preprocessing section Ai which, immediately after said pivot choosing section determines the 
transposed pivot 



3pf£l?k+l (47) 

for a given positive integer k and a value of a nonnegative integral parameter p, calculates 

a^j' = 4j&/4KUi (48) 

20 for pk + 2 £ j £ n and 



b$Z" = bj& / -agU* (49) 



for pk + 1 in said node's charge, 

k - 1 preprocessing sections At, where t = 2, 3 k, each of which is connected to said 

memory and calculates 

30 

Reg$ t = aXlp*+x * (50) 
* Reg&U = - Reg£l t ajggjjU , (51) 



t-2 



a p*+tJ X> = a pf£lj " J]) RGffpSc*? a pk^mJ ) 9 (53) 



t-X 



1 ' 



for pk + t £ j £ n for pk + t in said node's charge, and, immediately after said pivot choosing section 
determines the transposed pivot 
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ajgggl* (55) 



calculates 



a<&? = a&r* . (56) 

bgZ* = /agZgl (57) 

75 for pk + t + 1 £ j £ n for pk + t in said node's charge, 

an updating section which is connected to said memory, comprises a register set of k registers and 
an arithmetic unit, and calculates 



Reg}* -a^, (58) 



- ajpto a p*<ap**3* (59) 



Reg}™ « " £ W*» <*ftn>*. (60 ) 



Jt 

n-1 



(61) 



(62) 



45 for (p + 1 )k + 1 £ j S n for i in said node's charge retaining the values of 



Reg} 0) Reg} k) 



in said register set, 

a back-substitution section which is connected to said memory and obtains the value of said 
unknown vector x by calculating 

*i ■ (63) 
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and 



b^ M) -^ fl (64) 

5 

for i and h in said node's charge, 

a gateway which is connected to said memory and is a junction with the outside, and 
a transmitter which is connected to said memory and transmits data between said memory and the 
10 outside of said node through said gateway. 

4. . The equipment as claimed in claim 3, further including: 

parallel preprocessing Ai wherein if the (pk + 1)th through (p + 1)k-th rows of said A (0) and 
corresponding components of b <0) and x are assigned to said node a u , then said pivot choosing section 
75 of said node a u determines the pivot (47), and said preprocessing section of said node a u calculates 
(48) and (49) for pk + 2 £ j £ n, and said transmitter transmits the results to said memory of said every 
other node through said gateway, while said updating section B of said node in charge of the i-th row 
calculates (58) for every i such that (p + 1)k + 1 £ i 3 n, 

parallel preprocessing A,, where t = 2, 3 k, wherein said preprocessing section A, of said node 

20 a u calculates (50) through (54) for pk + 1 3 j 3 n, and. immediately after said pivot choosing section of 
a u determines the pivot (55), calculates (56) and (57) for pk + t + 1 S j £ n, and said transmitter 
transmits the results to said memory of said every other node through said gateway, while said 
updating section B of said node in charge of the i-th row calculates 



25 



Reg}"* - a$l t - £ Regf* a^^ e (65) 



30 for every i such that (p + 1)k + 1 £ i S n, 

parallel updating means wherein said updating section B of said each node in charge of the i-th row 
such that (p + 1)k + 1 £ i S n calculates (58) through (62) for (p + 1)k + 1 S j S n retaining the values 
of 

Regj c) , . . . , Regl k) 



in said register set, and 

40 

back-substitution wherein said back-substitution section of said each node in charge of the i-th and 
h-th components calculates (63) and (64) for 1 S h £ i - 1 for i = n, n - 1, .... 1 in this order of i, 

5. The equipment as claimed in claim 4 wherein said main controller distributes and assigns the rows of 
45 said coefficient matrix A (0) and the components of said b (0) and x to said nodes in such a manner as 
each block of consecutive said k rows and corresponding 2k components is transmitted to said 

memory of said one node in the cyclic order of said ao crp.i, ao, ai . . . , and, if n is a multiple of 

k, instructs said each node to execute said parallel preprocessing At through A« and said parallel 

updating B for p = 0, 1 n/k - 2 and to execute said parallel preprocessing Ai through A* for p = 

50 n/k - 1, and, if n is not a multiple of k, instructs said each node to execute said parallel preprocessing 
Ai through A* and said parallel updating B for p = 0, 1, . . . , [n/k] - 1, where [s] denotes the greatest 
integer not exceeding s, and to execute said parallel preprocessing Ai through A^^ for p = [n/k], 
and instructs said nodes to obtain said unknown vector x by executing back-substitution. 

55 6. Linear calculating equipment comprising: 
a network, 

a main controller that is connected to said network, and executes flow control, and 

a plurality of nodes a u , where u = 0, 1, . . . , P - 1, each of which is connected to each other by 
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said network and comprises: 

a memory that stores a plurality of coefficient matrices A (r) and vectors b (r> and an unknown vector 
x expressed by 

A Cr> = (ai (r) } § x 4 i§ j t ^ 

- (ti T \ 1>2 {X) *J r) ) e , 



X = (Xi, X 2t . . . , X n i 

15 where r = 0, 1, .... n, for a given system of linear equations 
A<°>x = tf°>. 

a pivoting section that is connected to said memory, chooses a pivot in the i-th row of A°* 1) , and 
20 interchanges the i-th column with the chosen pivotal column for i in said node's charge, 

a preprocessing section Ai that, immediately after said pivot choosing section determines the 
transposed pivot 

25 a^pjM (66) 

for a given positive integer k and a value of a nonnegative integral parameter p, calculates 

for pk + 2 £ j £ n and 

-Jrfg/aJS^ (68) 

40 for pk + 1 in said node's charge, 

k - 1 preprocessing sections A,, where t = 2, 3 k, each of which is connected to said 

memory and calculates 
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Regglt 55 a£lp*+i * (69) 
Reg£*lt ° &pi£lpx+2 " Reff$t *p£ifb>2 * (70) 



a) ) M ( 1 ( 



t-1 



ffl-1 * * 



J^"' - UK - £ (73) 



25 "S3 



for pk + t S j £ n for pk + t in said node's charge, and, immediately after said pivot choosing section 
determines the transposed pivot 



35 calculates 



&pk?tpk+t' (74) 



agt? » a<&P> /ajggil . (75) 



k(J**{) - h (Pit* t-1) / (pjfc*c-l) . 
-PpJt+t ~~ -Ppk+t / *pk+tpk+t (76) 



for pk + t + 1 £ j 3 n for pk + t in said node's charge, 

an updating section which is connected to said memory, comprises a register set of k registers and 
an arithmetic unit, and calculates 



55 
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(77) 
(78) 



Jt-i 



(79) 



js-1 



(80) 



is-i 



(81) 



for (p + 1)k + 1 S j £ n for i in said node's charge retaining the values of 



Regi 



(0) 



Regi 



tk) 



in said register set, and 

k - 1 postprocessing sections d, where t = 1, 2, .... k - 1, each of which is connected to said 
memory and calculates 



(82) 
(83) 



Po »(t-l) _ - (P**e) 



(84) 
(85) 
(86) 
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35 



40 



50 



XtfT 11 - Jt^ e> - 2*^° . (88) 



(89) 



USSfT 11 - *>S&* - «ag fM » irfST' (90) 

for pk + t + 2 £ j 3 n for pk + t in said node's charge, 

a gateway which is connected to said memory and is a junction with the outside, and 
20 a transmitter which is connected to said memory and transmits data between said memory and the 

outside of said node through said gateway. 

7. The equipment as claimed in claim 6, further including: 

parallel preprocessing Ai wherein if the (pk + 1)th through (p + 1)k-th rows of said A <0} and 

25 corresponding components of said b (0) and x are assigned to said node a Uf then said pivot choosing 
section of said node a u determines the pivot (66), and said preprocessing section of said node a u 
calculates (67) and (68) for pk + 2 £ j £ n, and said transmitter transmits the results to said memory of 
said every other node through said gateway, while said updating section B' of said node in charge of 
the i-th row calculates (77) for every i such that (p + 1)k + 1 £ i S n, 

do parallel preprocessing A t , where t = 2, 3, . ., t, wherein said preprocessing section At of said 

node a u calculates (69) through (73) for pk + t £ j & n, and, immediately after said pivot choosing 
section of a u determines the pivot (74), calculates (75) and (76) for pk + t + 1 £ j £ n, and said 
transmitter transmits the results to said memory of said every other node through said gateway, while 
said updating section B* of said node in charge of the i-th row calculates 



(91) 



for every i such that (p + 1)k + 1 £ i £ n, 

parallel updating means wherein said updating section B' of said each node in charge of the i-th 
row such that 1 £ i £ pk or (p + 1)k + 1 £ i £ n if n is a multiple of k or p < [n/k], where [s] denotes 
the greatest integer not exceeding s, and 1 £ i £ [n/k]k otherwise calculates (77) through (81) for (p + 
1)k + 1 £ j £ n if n is a multiple of k or p < [n/k] and for [n/k]k + 1 £ j £ n otherwise, retaining the 
values of 

Reg} e) , . . . , Reg} k) 



in said register set, and 

post-elimination C wherein said postprocessing sections C, of said node a u calculates (82) through 

(90) for pk + t + 2 £ j £ n for t = 1,2 k - 1 if n is a multiple of k or p < [n/k] and for t = 1,2... 

55 . , n - [n/k]k. 

8. The equipment as claimed in claim 7 wherein said main controller distributes the rows of said 
coefficient matrix A (0) and the components of said b m and x to said nodes in such a manner as each 
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block of consecutive k rows and corresponding 2k components is transmitted to said memory of said 

one node in the cyclic order of ao ap. 1t ao, an and, if n is a multiple of k, instructs said each 

node to execute said parallel preprocessing Ai through A*, said parallel updating B\ and said post- 
elimination C for p = 0,1 n/k - 1 and, if n Is not a multiple of k, instructs said each node to 

execute said parallel preprocessing Ai through A*, said parallel updating B\ and said post-elimination C 

for p = 0, 1 [n/k] - 1 and to execute said parallel preprocessing Ai through A^^, said parallel 

updating B\ and said post-elimination C for p = [n/k]. 

Linear calculating equipment comprising: 
a network, 

a main controller which is connected to said network and executes flow control, and 

a plurality of clusters CU. where u = 0, 1, .... P - 1, each of which is connected to each other by 

said network and comprises: 

a memory which stores blocks of k rows of each coefficient matrix A (r) and corresponding k 

components of each known vector b (r> and an unknown vector x assigned to said node and expressed 

by 



aw = la}?) , l aj * n, 



X = (Xi,X2, . . . ,x n ? 

for 0 S r £ n, for a given system of linear equations 

a C gateway which is a junction with the outside of said cluster, 

a transmitter which transmits data between said memory and the outside of said cluster through 
said C gateway, and 

a plurality of element processors PE t , where t = 1 , 2 P C) each of which is connected to said 

memory and comprises: 

a pivot choosing section which is connected to said memory, chooses a pivot in the i-th row of 
A <kl \ and interchanges the i-th column with the chosen pivotal column for i in said element processor's 
charge, 

a preprocessing section Ai that, immediately after said pivot choosing section determines the 
transposed pivot 



a pk+lpk+l (92) 

for a given positive integer k and a value of a nonnegative integral parameter p, calculates 

- ajT8j/«#!** (93) 

for pk + 2 5 j S n and 
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b£Z x) - J^/aJK,** (94) 



for pk + 1 in said element processor's charge, 

k - 1 preprocessing sections A,, where t = 2, 3, . . . , k, each of which is connected to said 
memory and calculates 

fte<7j*ie a a Jik?lpt+i * (95) 



Regglt = aJKpM - Reff$ t a$>& 2 . (96) 



t-2 



(97) 



a pk+tj & pk+tj 2^ Ae Vp**t *pk+w>j • f98) 



t-1 

35 i^l 



£pJt+* (99) 



for pk + t £ j S n, and. immediately after said pivot choosing section determines the transposed pivot 

ap£5&t> (ioo) 

calculates 



- ■JET 1 / «JR3£J» , (ioi) 
i^ el - tgr* / ( 102 ) 

for pk + t + 1 2 j S n, 

an updating section B that is connected to said memory, comprises a register set of k registers and 
55 an arithmetic unit, and calculates 
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^ J <0) - a}*g x . (103) 



Regl l) = a£& - Reg} 0) aJ£*U . (104) 



10 D „^<*-1) 



Regt* = - g W 1 ' ^4SS*». (105) 



a 4 «r>« - «8" - £ W*» . (106) 



20 

Di^)*) » jbr' - £ j><&* (107) 

for (p + 1)k + 1 £ j £ n for i in said element processor's charge retaining the values of 



30 



35 



Reg} 0) , . . . , Reg} k) 

in said register set, 

a back-substitution section which is connected to said memory and obtains the value of said 
unknown vector x by calculating 



X, » b} a) (108) 

ao and 

^fcnwni .j^wi {109) 

45 

for i and h in said element processor's charge, and 

a gateway which is connected to said memory and is a junction with the outside, 

10. The equipment as claimed in claim 9, further including: 

so parallel preprocessing CLAt wherein rf the (pk + 1)th through (p + 1)k-th rows of said A (0) and 

corresponding components of said b (0) and x are assigned to said cluster CLg, then said pivot choosing 
section of said element processor PEi of said CU determines the transposed pivot (92) of the (pk + 
1)th row, and said preprocessing sections Ai of said element processors of CLu simultaneously 
calculate (93) and (94) for pk + 2 S j s n and (95) with said each Ai calculating for elements and 

55 components in its charge, and said transmitter transmits the results to said memory of said every other 
cluster through said C gateway, while said updating section of said element processor in charge of the 
i-th row calculates (103) for every i such that (p + 1)k + 1 £ i $ n, 

parallel preprocessing CLA,. where 2 S t S k, wherein said preprocessing sections At of said cluster 
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CU simultaneously calculate (95) through (89) for pk + t S j £ n with said each A, calculating for 
elements and components in its charge, immediately after said pivot choosing section of said PE t of 
said CLu determines the pivot (100), simultaneously calculate (101) and (102) for pk + t + 1 * j 5 n, 
and said transmitter transmits the results to said memory of said every other cluster through said C 
5 gateway, while said updating section of said element processor in charge of the i-th row calculates 

W 1} = a#» e - £ Reg*™ (110) 

TO 

for every i such that (p + 1)k + 1 £ i S n, and 

parallel updating means wherein said updating sections B of said each element processor in 
charge of the i-th row such that (p + 1)k + 1 £ i £ n calculate (103) through (107) for (p + 1)k + 1 £ j 
15 Sr\ retaining the values of 



Reg} ', . . . . Regr 

20 

in said register set, and back-substitution wherein said back-substitution section of said each cluster in 

charge of the i-th and h-th components calculates (108) and (109) for 1 £ h £ i - 1 for i = n t n - 1 

1 in this order of i,. 

25 11. The equipment as claimed in claim 10 wherein said main controller distributes and assigns the rows of 
said coefficient matrix A <0) and the components of said b (0) and x to said clusters in such a manner as 
each block of consecutive k rows and corresponding 2k components is transmitted to said memory of 

said one cluster in the cyclic order of CLo CLp.i. CU, CU and, if n is a multiple of k, 

instructs said each cluster to execute said parallel preprocessing CLAi through CLA* and said parallel 

30 updating Be for p = 0, 1, . . . , n/k - 2 and to execute said CLAi through CLA* for p = n/k - 1, and, if n 
is not a multiple of k, instructs said each cluster to execute said CLAi through CLA* and said B c for p 

= 0, 1 [n/k] - 1 and to execute said CLAi through CLAn- [n/kik for p = [n/k], and instructs said 

each cluster to obtain said unknown vector x by means of said back-substitution. 

35 12. Linear calculating equipment comprising: 
a network, 

a main controller that is connected to said network and executes flow control, and 

a plurality of clusters CLu, where u = 0, 1 , . . . , P - 1 , each of which is connected to each other by 

said network and comprises: 
40 a memory that stores blocks of k rows of each coefficient matrix A (r) and corresponding k 

components of each known vector b (r> and an unknown vector x assigned to said node and expressed 

by 



45 A M = (a#) , l s i, j * n. 



50 

X = (X tf X 2 , ...,X n / 

for 0 S x £ n, for a given system of linear equations 

55 

A i0) x = 0ft 

a C gateway which is a junction with the outside of said cluster, 
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a transmitter which transmits data between said memory and the outside of said cluster through 
said C gateway, and 

a plurality of element processors PE t , where t = 1 , 2, . . . , P e , each of which is connected to said 
memory and comprises: 

a pivot choosing section which is connected to said memory, chooses a pivot in the i-th row of 
A (M) , and interchanges the i-th column with the chosen pivotal column for i in said element processor's 
charge, 

a preprocessing section Ai that, immediately after said pivot choosing section determines the 
transposed pivot 



ink) 

*pZ+lpk*l (111) 



for a given positive integer k and a value of a nonnegative integral parameter p, calculates 

- ija/^iRpfcn (112) 
for pk + 2 £ j £ n and 

*jg? ] = bgl/agl^ (113) 

for pk + 1 in said element processor's charge, 

k - 1 preprocessing sections At, where t = 2, 3 k, each of which is connected to said 

30 memory and calculates 
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Resr&t 



(115) 



t-2 



B-l 



(116) 



C-l 



(117) 



C-l 



. ipk+t-1) _ . (pJt) _ V-w J7^it (jd " 1) H****" 5 



JD-1 



(118) 



for pk + t £ j £ n, and, immediately after said pivot choosing section determines the transposed pivot 



_ ipk+ t-l) 
<*pk+tpk+t i 



(H9) 



calculates 



H (p**e> _ k <P**t-U / (p**t-i) 



(120) 
(121) 



for pk + t + 1 £ j S n, 

an updating section which is connected to said memory, comprises a register set of k registers and 
an arithmetic unit, and calculates 



(122) 
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(123) 



(124) 



(125) 



(126) 



in said element processor's charge retaining the values of 



Reg} 0) , . . . , Reg} k) 



set, 



an updating section which is connected to said pivot choosing section, comprises a register set of k 
registers and an arithmetic unit, and calculates 



W 0> -aj&. 



(127) 



(128) 
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Jt-i 



Refft™ - a^xu - £ a,J£& 1)JC , (129) 



a ^«« - a#» - £ tfS? • (130 ) 



(131) 



for (p + 1)k + 1 S j S n retaining the values 
of 



*e<fc< 0) Reg} k) 



in said register set. 



k - 1 postprocessing sections Ct, where t = 1, 2 k - 1, each of which is connected to said 

pivot choosing section and calculates 

Reg™ - aJg£U*i, (132) 
Reg™ » aJS£U*i. (133) 



* e <7 u -i> - ajgt&a. (134) 

•JEST' - - *^ (0> «jSKAi l . d35) 
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-JST* " 1*8? - - (136) 

• • • # 

*jer» - - w liSRS 1 * * das) 

Jb^T 1 ' = 2^ el - iw« Jb££V 1> . (139) 

irffT 11 - - Reg™ b££» ( i 4 o) 

Reg™ " a$Z&.t.i. (141) 

K<^ (1) = aJK^U^, (142) 

• • • / 

Jteg<**> = a$;&. t . lt (143) 

c **S? - w w JKsy . d44) 

a$Zr ] - *K8» - »g« ^KSff . (145) 

• • • f 

a^r 1 - - rfssy . d46> 

J^T* 1 ' - Jjjftr* - l#5S*» , (147) 

- - «V W bgtZ" . (148) 

J^T" - bjg? - bgZZ" (149) 
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for pk + t + 2 £ j £ n, and 

a gateway which is connected to said pivot choosing section and is a junction with the outside of 
said element processor. 

5 13. The equipment as claimed in claim 12, further including: 

parallel preprocessing CLAi wherein if the (pk + 1)th through (p + 1)Mh rows of said A w and 
corresponding components of said b C0) and x are assigned to said cluster CU. then said pivot choosing 
section of said element processor PEi of said CLu determines the transposed pivot (111) of the (pk + 
1)th row, and said preprocessing sections Ai of said element processors of said CU simultaneously 

10 calculate (112) and (1 13) for pk + 2 £ j £ n with said each Ai calculating for elements and components 
in its charge, and said transmitter transmits the results to said memory of said every other cluster 
through said C gateway, while said updating section B' of said element processor in charge of the i-th 
row calculates (122) for every i such that (p + 1)k + 1 £ i £ n, 

parallel preprocessing A,, where t = 2, 3, . . . , k. wherein said preprocessing sections A, of said 

is element processors of said cluster CLu simultaneously calculate (114) through (118) for pk + t £ j £ n 
with said each At calculating for elements and components in its charge and, immediately after said 
pivot choosing section of said PE t of said CU determines the pivot (119), simultaneously calculate 
(120) and (121) for pk + t + 1 £ j £ n, and said transmitter transmits the results to said memory of 
said every other cluster through said C gateway, while said updating section B' of said element 

20 processor in charge of the i-th row calculates 

xesrl"* « - g RtgJT* (150) 

25 

for every i such that (p + 1)k + 1 £ i £ n, 

parallel updating B' c wherein said updating section B' of said each element processor in charge of 
the i-th row such that 1 £ i £ pk or (p + 1)k + 1 £ i £ n if n is a multiple of k or p < [n/k] and 1 £ i £ - 
30 [n/k]k otherwise also calculates (122) through (126) for (p + 1)k + 1 £ j £ n if n is a multiple of k or p < 
[n/k] and for [n/k]k + 1 £ j £ n otherwise, retaining the values of 



35 



Regl 0) Reg}** 



in the register set. and 



post-elimination C c wherein said postprocessing sections C t of said element processors of said CLg 

40 simultaneously calculate (132) through (140) for j such that pk + t + 2 £ j £ n for t = 1,2 k - 1 if 

n is a multiple of k or p < [n/k] and for t = 1,2 n - [n/k]k otherwise. 

14. The equipment as claimed in claim 13 wherein said main controller distributes and assigns the rows of 
said coefficient matrix A (0> and the components of said b (0) and x to said clusters in such a manner as 

45 each block of consecutive k rows and corresponding 2k components is transmitted to said memory of 

said one cluster in the cyclic order of CU CLpf CU. CLi, . . , , and, if n is a multiple of k, 

instructs said each cluster to execute said parallel preprocessing CLAi through CLA*. said parallel 
updating B' c and said parallel postelimination C c for p = 0, 1, . . . , n/k - 1. and if n is not a multiple of 
k, instructs said each cluster to execute said parallel preprocessing CLAi through CLA*. said parallel 

50 updating B' c , and said postelimination C c for p = 0, 1, ... , [n/k] - 1 and to execute said parallel 
preprocessing CLAi through CLAwn/kik. said parallel updating B' c , and said postelimination C c for p = 
[n/k]. 

15. A parallel computer composed of C clusters CLi CLc connected by a network, said each cluster 

55 comprising P c element processors and a shared memory that stores part of each coefficient matrix A (r) 

and each known vector b (r> and an unknown vector x expressed by 
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A (rt - (at?) ,lsi,j*n. 

X = (Xl, X2, . . . , X,,)* 

for 0 3 r £ n, for a given system of linear equations 

07\< 0) x = °>, 

75 said parallel computer comprising: 

a data distribution means which distributes the rows of said coefficient matrix A <0) and the 
components of said vectors b <0) and x to said shared memory of said clusters in such a manner as 
each block of consecutive k rows and corresponding 2k components is transmitted to said shared 

memory in the cyclic order of CLi CLc, CU, CLa and assigns those distributed to said 

20 cluster's shared memory to its said element processors row by row, 

a pivot choosing means which chooses a pivot in a row assigned to said each element processor, 
an elementary pre-elimination means which, after said pivot choosing means chooses the pivot 

25 ***** 
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calculates 



in said element processor in charge of the (kP c + 1)th row, transmits the results to said shared 
40 memory of said every other cluster to which said element processor in charge of an i-th row such that 
kP c + 1 S i S n belongs, and, for I = 2 P c , calculates 
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for kP c + I 5 i S n in said element processor in charge of the i-th row, calculates 
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a kP c +lJ ° a kP^lj a kP c +lkPsi a kP c *lj 2u a ±*V»* ' 



in said element processor in charge of the (kP c + l)th row, and, after said pivot choosing means 
75 determines the pivot 

<Jtp c *l-i) 
a kPc+ZkP e +l* 



calculates 



25 a kP c *lJ m a k* e +lj / a kP c +2kP e +l * 



u (kP c +l) . <JtP c *l-l> , (JtP c *I-l) 



in said element processor in charge of the (kP c + l)th row, transmits the results 



to said shared memory of every other cluster to which said element processor in charge of an i-th row 
45 such that kP c + I + 1 £ i £ n belongs, 

a multi-pivot elimination means which calculates 

50 1>»2 
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in said each element processor in charge of the i-th row such that (k + 1)P C + 1 £ i * n, 

a means for testing if the operation of said multi-pivot elimination means was repeated [n/P c ] times, 

and 

a remainder elimination means that executes said elementary pre-elimination means for the ([n/P c h 
s P c + i)th row through the n-th row, if said testing means judges that the operation of said multi-pivot 
elimination means was executed [n/P c ] times, and n is not a multiple of P c . 

16. The parallel computer as claimed in claim 15, further including 
an elementary back-substitution means that calculates 

10 



15 in said element processor in charge of the i-th row after the elimination process of said parallel 
elimination method, 

an elementary back-transmission means that transmits X| to said shared memory of said every 
cluster to which said element processor in charge of an h-th row such that 1 £ h £ i - 1 belongs, 
an elementary back-calculation means which calculates 

20 



25 

for 1 £ h £ i - 1 in said element processor in charge of the h-th row, and 

a means for testing if the operation of said elementary back-substitution means was repeated from i 
= n to i = 1. 

30 17. The parallel computer as claimed in claim 15 wherein a pivot choosing method comprising: 

a search means whereby said element processor searches for a nonzero element in the order of 
increasing column numbers from the diagonal element in the same row, if a diagonal element of said 
coefficient matrix A (f) is 0, 

a column number broadcasting means that notifies said other element processors of the column 
35 number of a nonzero element found by said search means, 

an element interchange means whereby said each element processor interchanges the two 
elements which are in its charge and have the same column numbers as the above diagonal zero 
element and the found nonzero element, and 

a component interchange means whereby said two element processors interchange the two 
40 components of said unknown vector which are in their charge and have the same component indices as 
the column numbers of said diagonal zero element and said found nonzero element. 

18. The parallel computer as claimed in claim 15 wherein said pivot choosing method comprising: 

a search means whereby said element processor searches for an element with the greatest 
45 absolute value in the order of increasing column numbers from a diagonal element in the same row, 

a column number broadcasting means that notifies said other element processors of the column 
number of an element found by said search means, 

an element interchange means whereby said each element processor interchanges the two 
elements which are in its charge and have the same column number as said diagonal element and the 
so found element, and 

a component interchange means whereby said two element processors interchange the two 
components of the unknown vector which are in their charge and have the same component indices as 
the column numbers of said diagonal element and said found component. 

55 19. A parallel computer composed of C clusters CLi CLc connected by a network, said each cluster 

comprising P c element processors and a shared memory which stores part of each coefficient matrix 
A (r) and each known vector b <r) and an unknown vector x expressed by 
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X = (Xi, X2 f . . . , X n ? 

for 0 £ r £ n, for a given system of linear equations 

A<®x = tf°>, 

75 said parallel computer comprising: 

a data distribution means which distributes the rows of said coefficient matrix A (0) and the 
components of said vectors b (0) and x to said shared memory of said clusters in such a manner as 
each block of consecutive k rows and corresponding 2k components is transmitted to said shared 

memory in the cyclic order of CLi CLc, CU, CLa and assigns those distributed to said 

20 cluster's shared memory to its said element processors row by row, 

a pivot choosing means which chooses a pivot in a row assigned to said each element processor, 
an elementary pre-elimination means which, after said pivot choosing means chooses the pivot 

calculates 



(Jtp c *i> (kp c ) , (JtP c ) 
a kP € +Xj a kP e *ljf a kP e +XkP c +H 



in said element processor in charge of the (kP c + 1)th row, transmits the results to said shared 
40 memory of said every other cluster to which said element processor in charge of an i-th row such that 
kP c + 1 S i S n belongs, and, for I = 2 P c , calculates 



1 = a ikP c *l a ikP e +l a kP € +ikP c +l 2-r Ci a kP e *mkP^l 



for kP c + I £ i £ n in said element processor in charge of the i-th row, calculates 
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in said element processor in charge of the (kP c + l)th row, and, after said pivot choosing means 
determines the pivot 



„ Up«*i-D 



20 calculates 



25 



a kP c +lj = a k2^1j / a kP c +lkP c +l ' 



30 in said element processor in charge of the (kP c + l)th row, transmits the results 



35 



40 



(Jfep c *l) ^ ^ (Jfcp c *i-D 7 (*p c *l-D 



*>*J> C *J - *>kP c +l / a kP e +lkP c +l 



to said shared memory of every other cluster to which said element processor in charge of an i-th row 
such that kP c + I + 1 & i S n belongs, 

a multi-pivot elimination means that calculates 

a iJ a i* a i«Vl a *P 0 **i " Ci a kP e +mj # 
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in said each element processor in charge of the i-th row such that (k + 1)P C + 1 £ i S n, 



51 



EP 0 523 544 A2 



an elementary post-elimination means which calculates 




(r*l) 



in said element processor in charge of the i-th row, 

a post-elimination processing means which calculates 



- Jt>/ 



for I = - w + q + 1 for w » 1, . . , q and q = 1 P c - 1 for kP c + 1 £ i £ kP c + q in said 

element processor in charge of the i-th row, 

a means for testing if the operation of said post-elimination means was executed [n/P c ] times, and 
a remainder elimination means that executes said elementary pre-elimination means for the ([n/P c h 
P c + 1)th through the n-th rows and executes said multi-pivot elimination means and said post- 
elimination means, if the testing means judges that the operation of said post-elimination means was 
executed [n/P c ] times. 

20. The parallel computer as claimed in claim 18 wherein a pivot choosing method comprising: 

a search means whereby said element processor searches for a nonzero element in the order of 
increasing column numbers from the diagonal element in the same row, if a diagonal element of said 
coefficient matrix A (r) is 0, 

a column number broadcasting means that notifies said other element processors of the column 
number of a nonzero element found by said search means, 

an element interchange means whereby said each element processor interchanges the two 
elements which are in its charge and have the same column numbers as the above diagonal zero 
element and the found nonzero element, and 

a component interchange means whereby said two element processors interchange the two 
components of said unknown vector which are in their charge and have the same component indices as 
the column numbers of said diagonal zero element and said found nonzero element. 

21. The parallel computer claim 18 wherein said pivot choosing method comprising: 

a search means whereby said element processor searches for an element with the greatest 
absolute value in the order of increasing column numbers from a diagonal element in the same row, 

a column number broadcasting means that notifies said other element processors of the column 
number of an element found by said search means, 

an element interchange means whereby said each element processor interchanges the two 
elements which are in its charge and have the same column number as said diagonal element and the 
found element, and 

a component interchange means whereby said two element processors' interchange the two 
components of the unknown vector which are in their charge and have the same component indices as 
the column numbers of said diagonal element and said found component. 
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